dcsimg

Slicing and Dicing on the Command Line

If you don't know text, you don't know Linux. There are a host of methods for reformatting plain text -- including the text used by graphical applications like spreadsheets and email programs.

Plain text is a series of characters delimited into lines by newline (LF, line feed) characters. You can send this text directly to a terminal window with a utility like cat(1). There are no hidden formatting codes; it’s “just the text, ma’am.”

Before the puns get any worse, let’s dig in!

Quick Review

As you saw in last month’s column (if you didn’t see the column, you might want to review it), to start a new line at any point in plain text, simply insert a newline character. To join two lines, remove the newline between them — and maybe add a space or TAB character to separate them.

When a terminal or printer reads a TAB character, it moves the current position to the next tabstop. TAB characters are also used as field separators; you can make a simple database with TABs between the fields and a newline at the end of each record.

Linux utilities can also reformat text that doesn’t contain TABs. We’ll see examples of that, too.

Lots of Possibilities

Many GNU utilities started in the days of Unix — back when a tty really was a teletype. Without a graphical display (or a graphical editor) to rearrange text, programmers came up with many ways to slice, dice, and reassemble data from scripts and the command line.

We’ll see some of those ways: Enough ways, I hope, that people new to this way of handling text will be ready to find other ways — and gurus will still get a few surprises.

Starting with a Spreadsheet

Plain text can come from lots of places, including:

  • The output of a utility (grep, for instance),
  • Text saved from an application (see Figure One for an example),
  • Text pasted into a terminal window from a graphical application, as in Figure Two near the end of this article.

Note that some of this text may not be “plain” characters. For instance, if you’re copying from a web page designed by a Macintosh user, the designer may have unwittingly included the Macintosh encoding of a special character (maybe a “curly quote”) that isn’t recognized on your Linux system.

For the first few examples, let’s use an OpenOffice.org spreadsheet file saved as plain text. (On the File menu, choose Save As, type Text CSV.) Assuming that the data doesn’t contain any TAB characters, you can set the Field Delimiter to TAB and the Text Delimiter to none (delete the default quote mark in that dialog box). Figure One shows this.

Figure One: Saving a spreadsheet as plain text
Figure One: Saving a spreadsheet as plain text

Below are are two views of the resulting file data.txt (renamed from the default data.csv). First, plain cat outputs the TAB characters between fields, which the terminal displays by moving to the next tabstop position. Next, cat -tve shows what’s actually in the file:

$ cat data.txt
STATE	CITY	COUNTY	POP.	GOVT.
AZ	Ely	Gila	123	Mayor
CA	Alma	Lolo	345	Sheriff
TX	Leroy	El Paso	22	Bubba
$ cat -tve data.txt
STATE^ICITY^ICOUNTY^IPOP.^IGOVT.$
AZ^IEly^IGila^I123^IMayor$
CA^IAlma^ILolo^I345^ISheriff$
TX^ILeroy^IEl Paso^I22^IBubba$

Checking the data file with cat -tve or od -c is a good idea. They’ll reveal “hidden” or “non-plain” characters buried in the data. Notice the space character in the field El Paso. Because the field separator is a TAB, the space doesn’t cause any problems.

Utilities that Understand TABs

Scripting languages (Perl, awk, …) can parse and write TAB-separated data. Table One lists some other Linux utilities that handle TABs.

Table One: Some utilities that understand TABs

Utility Description
cut(1) Remove sections from each line of files
echo(1), printf(1) Write arguments to standard output (\t makes a TAB)
expand(1), unexpand(1) Convert TABs to spaces, spaces to TABs
paste(1) Merge lines of files into TAB-separated output
sed(1) Stream editor
sort(1) Sort data by one or more of its fields

Whether your data comes from a spreadsheet or some other source, if you can massage your data into TAB-separated fields, the examples below can help you slice and dice it. Examples toward the end of the article cover other types of data.

Comments on "Slicing and Dicing on the Command Line"

Hey, thanks for the blog article.Much thanks again. Fantastic.

This is one awesome article post.Much thanks again. Great.

wow, awesome post.

Good post. I learn something totally new and challenging on blogs I stumbleupon everyday. It’s always exciting to read through articles from other writers and practice something from other web sites.

I loved your article post.Thanks Again. Much obliged.

Do payday loans passion you, but are uncertain regarding getting one?

Really informative article. Keep writing.

I really enjoy the article.Really looking forward to read more. Really Great.

Thanks-a-mundo for the post.Thanks Again. Really Cool.

Great blog article.Much thanks again. Will read on…

Appreciate you sharing, great blog post.Really thank you! Really Cool.

Major thankies for the blog post.Really thank you!

Very good article post.Thanks Again. Much obliged.

I loved your blog.Thanks Again. Keep writing.

Im obliged for the blog article.Thanks Again.

Appreciate you sharing, great blog post.Thanks Again.

We’re a group of volunteers and opening a new scheme in our community. Your website provided us with valuable info to work on. You have done an impressive job and our whole community will be thankful to you.

Appreciate you sharing, great article. Really Great.

Im grateful for the article post. Awesome.

I intended to send you that bit of remark just to say thanks over again about the spectacular techniques you have featured on this site.

Amazing! This blog looks exactly like my old one! It’s on a completely different topic but it has pretty much the same layout and design. Outstanding choice of colors!

Looking forward to reading more. Great blog post.

Very good article post. Much obliged.

Thanks so much for the post.Thanks Again. Really Cool.

I really liked your article.Much thanks again.

Thanks for sharing, this is a fantastic blog post.Thanks Again. Really Great.

I really liked your post.Really looking forward to read more. Really Great.

I truly appreciate this article post.Much thanks again. Fantastic.

Hey, thanks for the blog article.Really thank you! Much obliged.

Thanks again for the blog.Really thank you! Great.

Im grateful for the blog post.Much thanks again. Really Great.

A big thank you for your blog article.Thanks Again. Much obliged.

Fantastic blog! Do you have any tips and hints for aspiring writers?
I’m planning to start my own site soon but I’m a little lost on everything.
Would you propose starting with a free platform like WordPress or go for a paid option?
There are so many options out there that I’m completely overwhelmed ..
Any recommendations? Thanks a lot!

Looking forward to reading more. Great article post. Cool.

Enjoyed every bit of your blog post.Really looking forward to read more. Really Great.

I was looking for such certain strategies for a long time. Give thanks you and enjoy.

one of our guests lately advised the following website

Very informative blog.Really looking forward to read more. Really Great.

I think this is a real great blog article.Much thanks again. Want more.

Aw, this was a very nice post. Finding the time and actual effort to make a very good article???‚¬?¦ but what can I say???‚¬?¦ I put things off a whole lot and never seem to get nearly anything done.

Say, you got a nice article.Much thanks again. Fantastic.

I really liked your blog post.Thanks Again. Much obliged.

“My brother suggested I might like this blog. He was entirely right. This post truly made my day. You can not imagine just how much time I had spent for this info! Thanks!”

“Thanks , I have recently been searching for information about this subject for a while and yours is the best I have discovered till now. But, what concerning the conclusion? Are you certain about the supply?”

“I cannot thank you enough for the blog post.Really looking forward to read more.”

“Very informative article.Really looking forward to read more. Much obliged.”

Im thankful for the post. Will read on…

“Thank you for your blog post.Really thank you! Awesome.”

This is one awesome blog.Much thanks again. Really Cool.

Leave a Reply