x
Loading
 Loading
Hello, Guest | Login | Register

Hitting the Motherlode

Mining data for information?

sawblade2

Among computing flatfoots — DBAs, Web application developers, shell script programmers, data miners, system administrators, and that ilk — the terms delimiter, pattern, and munge (pronounced “munj”) are common and well-known. Indeed, these and other even more mysterious phrases — backslash dee, caret dollar, and slash gee – get thrown around so much they must sound like gibberish to the uninitiated.

No, these expressions aren’t the refined phrasings of the cognoscenti, nor are they passwords of some secret society. They’re slang — shop talk about one of computing’s handiest tools, regular expressions (or regex).

In a nutshell, regular expressions are a powerful, compact, and concise shorthand for describing patterns of text. The shorthand, perhaps cryptic at first, is a hyper-minimalist solution to a big problem: “does this data conform to this format?”

As it turns out, recognizing and parsing text are two of the most common problems solved with a computer. Think about it. Almost every computer application munges (”munge” means “take all of this data and extract something meaningful out of it”) data in some way. Word processors search documents to find and correct double words (e.g., “the the”); email applications deconstruct messages to extract mail headers (e.g., “From”, “To”, “Bcc”); Web applications parse form data to validate input fields (e.g., a zip code or a telephone number).

If you have to munge data, regular expressions are indispensable. Indeed, once you know how to write…

Please log in to view this content.

Not Yet a Member?

Register with LinuxMagazine.com and get free access to the entire archive, including:

  • Hands-on Content
  • White Papers
  • Community Features
  • And more.
Already a Member?
Log in!
Username

Password

Remember me

Forgotten your password?
Forgotten your username?
Read More
  1. Helpful Tools for Software Developers
  2. The Github Hall of Fame
  3. Book'em, Github.
  4. This Week on Github: Stupid Ruby Tricks
  5. A Veritable Scatter Shot!
Follow Linux Magazine
Rackspace