dcsimg

Slicing and Dicing on the Command Line

If you don't know text, you don't know Linux. There are a host of methods for reformatting plain text -- including the text used by graphical applications like spreadsheets and email programs.

Plain text is a series of characters delimited into lines by newline (LF, line feed) characters. You can send this text directly to a terminal window with a utility like cat(1). There are no hidden formatting codes; it’s “just the text, ma’am.”

Before the puns get any worse, let’s dig in!

Quick Review

As you saw in last month’s column (if you didn’t see the column, you might want to review it), to start a new line at any point in plain text, simply insert a newline character. To join two lines, remove the newline between them — and maybe add a space or TAB character to separate them.

When a terminal or printer reads a TAB character, it moves the current position to the next tabstop. TAB characters are also used as field separators; you can make a simple database with TABs between the fields and a newline at the end of each record.

Linux utilities can also reformat text that doesn’t contain TABs. We’ll see examples of that, too.

Lots of Possibilities

Many GNU utilities started in the days of Unix — back when a tty really was a teletype. Without a graphical display (or a graphical editor) to rearrange text, programmers came up with many ways to slice, dice, and reassemble data from scripts and the command line.

We’ll see some of those ways: Enough ways, I hope, that people new to this way of handling text will be ready to find other ways — and gurus will still get a few surprises.

Starting with a Spreadsheet

Plain text can come from lots of places, including:

  • The output of a utility (grep, for instance),
  • Text saved from an application (see Figure One for an example),
  • Text pasted into a terminal window from a graphical application, as in Figure Two near the end of this article.

Note that some of this text may not be “plain” characters. For instance, if you’re copying from a web page designed by a Macintosh user, the designer may have unwittingly included the Macintosh encoding of a special character (maybe a “curly quote”) that isn’t recognized on your Linux system.

For the first few examples, let’s use an OpenOffice.org spreadsheet file saved as plain text. (On the File menu, choose Save As, type Text CSV.) Assuming that the data doesn’t contain any TAB characters, you can set the Field Delimiter to TAB and the Text Delimiter to none (delete the default quote mark in that dialog box). Figure One shows this.

Figure One: Saving a spreadsheet as plain text
Figure One: Saving a spreadsheet as plain text

Below are are two views of the resulting file data.txt (renamed from the default data.csv). First, plain cat outputs the TAB characters between fields, which the terminal displays by moving to the next tabstop position. Next, cat -tve shows what’s actually in the file:

$ cat data.txt
STATE	CITY	COUNTY	POP.	GOVT.
AZ	Ely	Gila	123	Mayor
CA	Alma	Lolo	345	Sheriff
TX	Leroy	El Paso	22	Bubba
$ cat -tve data.txt
STATE^ICITY^ICOUNTY^IPOP.^IGOVT.$
AZ^IEly^IGila^I123^IMayor$
CA^IAlma^ILolo^I345^ISheriff$
TX^ILeroy^IEl Paso^I22^IBubba$

Checking the data file with cat -tve or od -c is a good idea. They’ll reveal “hidden” or “non-plain” characters buried in the data. Notice the space character in the field El Paso. Because the field separator is a TAB, the space doesn’t cause any problems.

Utilities that Understand TABs

Scripting languages (Perl, awk, …) can parse and write TAB-separated data. Table One lists some other Linux utilities that handle TABs.

Table One: Some utilities that understand TABs

Utility Description
cut(1) Remove sections from each line of files
echo(1), printf(1) Write arguments to standard output (\t makes a TAB)
expand(1), unexpand(1) Convert TABs to spaces, spaces to TABs
paste(1) Merge lines of files into TAB-separated output
sed(1) Stream editor
sort(1) Sort data by one or more of its fields

Whether your data comes from a spreadsheet or some other source, if you can massage your data into TAB-separated fields, the examples below can help you slice and dice it. Examples toward the end of the article cover other types of data.

Comments on "Slicing and Dicing on the Command Line"

I am so grateful for your post.Much thanks again. Awesome.

One of our visitors not too long ago advised the following website.

Awesome blog article.Really thank you! Really Cool.

Aw, this was an extremely good post. Finding the time and actual effort
to make a great article… but what can I say… I put things off a whole lot and never
seem to get anything done.

Great, thanks for sharing this blog.Really thank you! Keep writing.

This is one awesome article.Thanks Again. Cool.

I loved your blog article.Thanks Again. Want more.

Thank you ever so for you post.Much thanks again. Really Great.

This design is spectacular! You obviously know how to
keep a reader entertained. Between your wit and your videos,
I was almost moved to start my own blog (well, almost…HaHa!)
Great job. I really enjoyed what you had to say, and more than
that, how you presented it. Too cool!

Amazing blog! Is your theme custom made or did
you download it from somewhere? A design like yours with a
few simple tweeks would really make my blog
jump out. Please let me know where you got your design. Thanks a
lot

reverse lookup cell phone free Your positions constantly have a decent amount of really ddbdkbeddkdd

Always a major fan of linking to bloggers that I appreciate but do not get a lot of link appreciate from.

Say, you got a nice article. Really Great.

Looking forward to reading more. Great article post. Great.

Major thankies for the post.Really thank you! Cool.

Hi there, I enjoy reading through your post. I like to write a little comment to support
you.

What a material of un-ambiguity and preserveness of valuable knowledge
regarding unpredicted emotions.

Major thanks for the article. Much obliged.

Looking forward to reading more. Great blog article.Really looking forward to read more. Much obliged.

Although websites we backlink to beneath are considerably not connected to ours, we really feel they may be basically really worth a go through, so have a look.

Thanks in favor of sharing such a good thinking,
piece of writing is fastidious, thats why i have read it completely

Check below, are some absolutely unrelated internet sites to ours, even so, they may be most trustworthy sources that we use.

Thanks for the blog.Thanks Again. Really Great.

Great, thanks for sharing this article post.Really thank you! Really Cool.

Good post however I was wondering if you could write a litte more
on this topic? I’d be very thankful if you could elaborate a little bit further.
Thanks!

This is one awesome post.Thanks Again. Keep writing.

Thanks for the article post.Thanks Again. Keep writing.

Great, thanks for sharing this article post.Much thanks again. Really Great.

Wow, great article post.Really looking forward to read more.

Oh my goodness! Amazing article dude! Many thanks, However I am having
troubles with your RSS. I don’t know the reason why
I am unable to subscribe to it. Is there anybody else having identical
RSS issues? Anybody who knows the solution will you kindly respond?

Thanx!!

Please check out the web pages we stick to, like this 1, as it represents our picks through the web.

Thanks-a-mundo for the blog.Thanks Again. Really Great.

Hello to all, the contents existing at this website are truly
remarkable for people experience, well, keep up the good work fellows.

Say, you got a nice blog post. Want more.

Wow, great article post.Really thank you! Keep writing.

One of our guests a short while ago recommended the following website.

I cannot thank you enough for the blog post. Will read on…

Thanks again for the article.Really looking forward to read more. Fantastic.

Im thankful for the blog article.Really thank you! Keep writing.

Wow, this post is fastidious, my younger sister is analyzing these things, therefore I am going to let know her.

wow, awesome article.Really looking forward to read more. Fantastic.

Right now it seems like WordPress is the best blogging platform available right now. (from what I’ve read) Is that what you are using on your blog?

Since the admin of this website is working, no hesitation very shortly it will be renowned, due to its feature contents.

As fans, we all know who really calls the shots in Big D.

Thanks again for the blog post.Really looking forward to read more. Will read on…

Very neat blog article. Keep writing.

Heya! I’m at work browsing your blog from my new iphone 3gs! Just wanted to say I love reading your blog and look forward to all your posts! Keep up the great work!

Thank you ever so for you article.Thanks Again. Want more.

Hello, i think that i saw you visited my website so i came to return the favor.I am attempting to find things to improve my site!I suppose its ok to use some of your ideas!!

Good article. I am dealing with a few of these issues as well..

Leave a Reply