dcsimg

Trimming, Scripting, and Cutting

If you're moving a text file from Windows to Linux, and unless your ftp program has a method of converting the Windows text to Unix, the file will have a CONTROL-M (^M) character at the end of each line. In many cases, the control characters don't hurt anything, because CONTROL-M, or carriage return, is treated as white space, and is ignored by many programs. However, some programs, such as the Perl interpreter, are adversely affected.

How can I get rid of those pesky Control-M characters at the end of each line in a file?

If you’re moving a text file from Windows to Linux, and unless your ftp program has a method of converting the Windows text to Unix, the file will have a CONTROL-M (^M) character at the end of each line. In many cases, the control characters don’t hurt anything, because CONTROL-M, or carriage return, is treated as white space, and is ignored by many programs. However, some programs, such as the Perl interpreter, are adversely affected.

You can remove CONTROL-M’s with the dos2unix utility. If it’s installed on your system, just type dos2unix -o oldfile_ name -n new_filename or dos2unix -k filename. The former command creates a new file; the latter replaces the original file with the newly converted file.

If dos2unix isn’t available, you can also use vi to fix this problem. Edit the file in vi and enter the command :%s/ ^M$//. The colon tells vi to allow you to enter an ex command; the percent sign indicates you want to perform the command on all lines of the file; and the substitute command, s///, replaces the text between the first and second slash with the text between the second and third slash. In our case, a ^M at the end of the line ($) is replaced with nothing.

For more about ex commands, see this month’s “Power Tools” column, page 34.

There is a catch here, when you enter the ^M, you need to en-ter a CONTROL-V first. The ^V tells vi to accept the next character literally.

I’d like to learn how to script. I’ve heard of ksh, bash, Bourne Shell, C Shell, Perl, Tcl/Tk, and other. What should I learn?

The answer to the question depends on what you want to accomplish. Are you doing web programming, or are you developing operating systems utilities? Are you going to build graphical user interfaces (GUIs)? How large is the project? With some requirements in mind, you can weigh your options.

The list of available scripting languages is long. Of the more commonly installed scripting languages on a Linux system, bash, ksh, and Bourne Shell (sh) are all very closely related. Of these three, bash is probably the most common.

If you plan on doing any system-related work, learning the basics of bash is highly recommended. For example, all of the system startup scripts under /etc/init.d/ and /etc/rc?.d/ are typically written in some flavor of sh that bash can interpret.

The C Shell (which includes csh and tcsh) has a scripting language closer to the C programming language, but due to inefficiencies at execution time, it’s considered an inferior scripting language for systems-related work. However, for quick and dirty scripts, it works well, especially if you already know C.

So far, all of these languages are more appropriate to text inputs and focused tasks.

Perl is arguably the “Swiss Army knife” of scripting languages. It’s used heavily in web environments; it has modules to create and program GUIs; and it’s an effective scripting language for large projects. If you’re a system administrator, Perl is an excellent second choice, after bash. If you’re more of a developer, Perl might be a good first choice. The only caveat is that with all of the Perl modules available, there is a steep learning curve, but one well-worth the effort, especially if you learn Perl’s object-oriented (OO) features.

Tcl/Tk is a GUI-oriented language with many similarities to bash. In an X-based environment, it’s a very useful tool, but it isn’t seen very often in the commercial world. If you’re interested in bash, but need a GUI, look at Tcl/Tk.

With dozens of scripting languages available, it’s impossible to cover each one. Take the questions we began with, and consider what types of tasks, interfaces, and features you need or want. From there, use the Web and ask other Linux/Unix users to find the language that fits your needs.

If you simply wish to jump in, learn bash first, then Perl. At the very worst, you will have two more scripting languages that you can put on a resume.

Is there an easy way to process delimited files?

Delimited files are text files that use some special character, typically a comma or a TAB, to separate fields of a record. If you need to extract some fields and ignore others — a common task on Linux machines — try cut. Using cut, you can extract one or many contiguous or discontiguous fields from a delimited record, and even extract individual bytes. For example, to extract the username and the shell from /etc/passwd (the first and seventh fields, respectively), use the command cut -d ‘:’ -f 1,7 /etc/passwd. If you omit a filename argument or use the - option, cut processes stdin, making it a nice filter.



John R. S. Mascio is an IT consultant specializing in Linux and Open Systems. He can be reached at mascio@ryu.com.

Comments are closed.