dcsimg

Everything is Unix

Programming in a higher-level language, it's often easy to forget about using lower-level Unix facilities in tricky situations. Here are a few examples to give you an idea of what you might be missing.

Recently there has been some chatter on various programming blogs about how we should be using classic Unix features to build more scalable infrastructure. This all started when Ryan Tomayko wrote I like Unicorn because it’s Unix. The gist of that post was that Eric Wong’s Unicorn, an HTTP server written in Ruby, performed extremely well despite being written in Ruby because Eric wasn’t afraid to drop down to the lower-level Unix system calls instead of using the language’s traditional higher level abstractions.

Ryan does an excellent job of explaining exactly how following age-old Unix design patterns and using those system calls is the right way to go. He provided the code for a simple TCP “echo server” that can handle clients very efficiently. Not to be outdone, other popular scripting languages saw their advocates step forward with examples of doing the same thing. In Python is Unix, Jacob Kaplan-Moss provides a Python implementation of the same echo server. In Perl is Unix, Aristotle Pagaltzis presented a Perl implementation of a pre-forking echo server as well.

What you notice in each example is that the code is surprisingly readable and simple, letting the operating system to the really heavy lifting. In fact, they’re all very similar. Most of the real differences are syntactic sugar from the particular language. And that’s the whole point of these examples. Linux and Unix have some amazing built-in facilities for solving common problems and they’ve been around for a long time. But the reality is that many of the people coding in higher level languages like Ruby, Python, or Perl may not even be aware of them. Making matters worse, as Ryan points out, is a lack of documentation. Some of the high-level languages (Ruby in particular) do a poor job and really describing the low-level calls they expose or why you might use them. So if you don’t already have more than a basic understanding of Unix systems programming to fall back on, the odds are really not in your favor.

I won’t reiterate the benefits of the networking system calls that Ryan, Jacob, and Aristotle showcase in their examples. But I would like to consider a few other Unixisms that are ofter overlooked and can make some classes of problems easier to solve.

Multi-process with fork()

Ryan touches on this a bit in his post, but I’d like to draw a bit more attention to the power of the fork() system call. When you have a lot of work to do, using fork() to make one or more “clones” of your process so you can divide and conquer works quite well–especially on multi-core CPUs. Using waidpid(), the parent process has a reliable mechanism for waiting for all the workers to exit().

One of the main benefits of fork() is that child processes will inherit almost everything from their parent. That can be especially helpful if there’s a large amount of identical information that each worker needs to have fast access to. The parent can read that data in before forking and each child will also inherit a copy. But thanks to modern copy-on-write techniques, the child process ends up sharing the exact same copy that the parent had. So if the parent reads in 512MB of data and then forks 10 children, you don’t need 5GB of RAM to support the children. Unless they start modifying the data, memory bloat will not be an issue.

The best part of all is that this all completely automatic, on by default, and something you get for free. There’s not low-level programming required on your part to get these benefits. The operating system knows how to do these things and does them quite well.

Atomic File Operations and Locking

Often times when you have multiple process all trying to fetch data from the same pool or perform the same service, you can end up end up with a hard to recreate and debug race condition. You may end up with multiple process believe they have exclusive access to a given resource, duplicating each other’s work and potentially causing a myriad of problems that could be challenging to undo.

A classic Unix solution to this problem is to use an atomic file operation. The common choice is to use one of the atomic filesystem metadata operations, such as rename(). The basic operation is like this. If you have multiple processes all trying to get exclusive access to a resource, you can use a file to mediate that. Each process will try to create /tmp/data.lock and write its process id (PID) into it. The process that wins is then allowed to use the resource, removing the file when done. For added safety, processes should check to see if the current lock holder is alive. If not, they may treat the lock as stale and remove it.

But simply trying to create the file and write a PID into it is not atomic. You could check to see if the file exists, create it, and write to it. That’s the simple approach that sounds good on paper, but if multiple processes are trying to do that at once, you end up with a number of possible races. The traditional solution is for each process to create a uniquely named file such as /tmp/data.lock.$PID (though that can be improved too) and then use an atomic file operation that results in creating /tmp/data.lock. The two common choices for an atomic operation are link and rename() or link().

The rename() call is what the Unix command mv uses under the hood. It will change the file from its temporary name to the final name as long as that file doesn’t already exist. The link() call will try to create a new directory entry (a hard link) that references the same file. In either case, the underlying file (the inode) does not change, only the meta data in the file system does. And those changes are atomic.

This works well as long as you’re not using NFS. That’s a whole can of worms all its own.

Others

Those are just a few examples of letting the OS doing some of your heavy lifting in sticky situations. If you’re been working in scripting languages for a while but haven’t spent much time looking at lower-level facilities like this, it might be worth making the time do do you. You might be surprised by how much you can learn and how much better your code could be. In addition to those described so far, I suggest learning more about signals and your your language of choice implements them. And if you have the time and inclination, I highly recommend a copy of Advanced Programming in the Unix Environment (second edition) by Richard Stevens. It’s truly a classic that provides rich examples of so many useful things that Unix can handle for you.

Have you found yourself simplifying code and making it more reliable by stepping back and letting the lower-level system calls do the hard work? Tell us about it in the comments.

Comments on "Everything is Unix"

wcn00

Most of your examples seem to stem from the language in question lacking proper threads support and, in the case of the serialization issue, a proper mutex implementation for threads. I\’m very much in favor using low level facilities when the higher language lacks them, I do it all the time, but the rate at which you have to do it is usually indicative of a weakness in the high level language.
I don\’t think its necessary to resolve those weaknesses because many of these interpreters and compilers were written for specific uses, but if you find yourself stepping outside the language to accomplish crucial activities, then perhaps you\’ve chosen the wrong language in the first place… no?
wcn

Reply
lescoke

All of the standard Unix (POSIX) system calls are well supported on most platforms. The only notable exception that comes to mind is Windows where POSIX compatibility is lack luster.

I agree the Stevens book should be on every serious programmer\’s shelf.

Les

Reply
ndatta

Nice article, but several grammatical errors.
- ND

Reply
bridget99

I agree wholeheartedly. Also, there are advantages to old-style UNIX multiprocessing (fork() and spawn()) even in languages that do support threads. Newer programmers today tend to think that threads are the only available abstraction for multiprocessing. That\’s not the case- fork() and spawn() are not just older and more widely available than multithreading, they\’re also safer.

Assuming one CAN do something with processes (vs. threads), then I think one SHOULD… even if this results in process-spawning code which more up-to-date programmers (erroneously) consider awkward.

I work in a pure Win32 environment but make heavy use of things like Cygwin and MinGW just to inject some sanity into my work. Not so long ago I \”fixed\” something by recompiling it using MinGW instead of Visual C++. My \”audience\” probably has very little concept of how much of my work is actually resting on the concepts- and at times the code base – of UNIX.

Reply

I Am doing a fiverr gig myself im selling 1000 auto-aprove blogs for only a fiver! its amazing
nike air max 2016 [url=http://www.airmax90cz.com]nike air max 2016[/url]

Reply

I really like and appreciate your blog post.Much thanks again. Awesome.

Reply

I’m just commenting to let you be aware of of the perfect discovery my friend’s child had checking your web site. She discovered plenty of details, which included what it is like to possess an ideal giving mood to let most people clearly have an understanding of various grueling subject matter. You undoubtedly did more than people’s expectations. Many thanks for churning out these productive, dependable, revealing and also cool tips on your topic to Tanya.

Reply

I must say this information was very interesting. I stumbled onto you using a google search and was rather thankful for your rank for this article.
yeezy 350 boost

Reply

Why visitors still use to read news papers when in this technological globe the whole thing is presented on web?
adidas yeezy boost

Reply

I really appreciate this post. I’ve been looking everywhere for this! Thank goodness I found it on Bing. You have made my day! Thank you again! “One never goes so far as when one doesn’t know where one is going.” by Johann Wolfgang von Goethe.
nike air max pánské

Reply

Michel-Marie Zanotti-Sorkine, n
yeezy boost 350

Reply

you’re really a good webmaster. The website loading speed is incredible. It seems that you’re doing any unique trick. Furthermore, The contents are masterwork. you have done a fantastic job on this topic!
yeezy boost

Reply

thank you – perfect transaction

Reply

Here are some links to web-sites that we link to because we believe they are really worth visiting.

Reply

Really enjoyed this blog post, how can I make is so that I get an email sent to me every time you publish a new update?

Reply

Wonderful blog! I found it while searching on Yahoo News. Do you have any tips on how to get listed in Yahoo News? I’ve been trying for a while but I never seem to get there! Thanks

Reply

Do you have any methods to protect against hackers?

Reply

you ave gotten an amazing blog right here! would you like to make some invite posts on my weblog?

Reply

2VLTfE Very informative blog post.Really looking forward to read more. Keep writing.

Reply

hello!,I love your writing very a lot! share we keep up a correspondence more approximately your post on AOL? I require a specialist on this house to resolve my problem. May be that’s you! Having a look ahead to see you.

Reply

This is a topic close to my heart cheers, where are your contact details though?

Reply

Simply want to say your article is as surprising. The clearness on your post is just excellent and i can think you’re knowledgeable on this subject. Well with your permission let me to take hold of your RSS feed to stay updated with coming near near post. Thanks a million and please continue the rewarding work.|

Reply

That is the finish of this report. Here you will find some sites that we consider you?ll enjoy, just click the hyperlinks.

Reply

One of our guests not too long ago encouraged the following website.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>