dcsimg

Linux 2.6.37: Scalability Improvements Abound

While 2.6.37 might be considered a quiet release, there are some very nice scalability improvements for file systems and one cool new feature that warrant a review.

This year’s holiday kernel was 2.6.37, which was actually released on 4 January 2011 (perhaps it’s a New Year’s kernel) and is a good example of a kernel release during the holidays. At first glance, one would think that it was a quiet kernel with no flaming articles on the web or some seriously flawed benchmarks being posted, but you didn’t see too much of that. However, there are some great things that happened in 2.6.37 around file systems and one really cool feature that I’ll talk about at the end of the article.

Improvements for ext4

Ext4 is the proverbial little engine that could. The file system has proven to have remarkably great performance and it solves many (most) of the issues with ext3. However, it is still really limited to 16 TB because the user space tools have not been updated yet (good project if anyone is interested). In 2.6.37, several really cool features were added to ext4, primarily around scalability.

Systems are getting more cores faster than we may realize. A four-socket AMD system that has 12-cores per socket resulting in a total of 48 cores in a single system, is a fairly affordable server. In the 2.6.37 kernel, scalability improvement patches were added to ext4. In particular, ext4 will now use the “bio” layer directly rather than use the intermediate “buffer” layer. The basic reason is that buffer layer has a number of performance and SMP scalability issues. The bio layer (bio = Block I/O) is the part of the kernel that sends the requests to the I/O scheduler allowing performance and scalability to improve.

An example of the scalability improvement was that a ffsb benchmark on a 48-core AMD system using a 24-disk hardware RAID array with 192 simultaneous ffsb threads improved performance by 300% (400% if journaling was disabled) compared to performance before this patch was applied. Moreover, CPU usage was reduced by a factor of 3-4 in the benchmark.

In addition to the scalability patches for ext4, 2.6.37 added a couple of cool new features. The first one is that mke2fs, the command to create an ext based file system, now has the ability to leave the inode table uninitialized. This means that the creation of an ext4 file system can now happen very quickly whereas before the inode table had to be constructed taking some time. However, the inode table has to be initialized as quickly as possible for the file system to be useful. So on the first mount of the file system, the kernel runs a kernel thread that will initialize the table.

The second patch added batched discard support to ext4. This may sound uneventful, but it has a big improvement in one area – SSD’s. Recall that the TRIM command in SSD’s can result in much better overall performance because the blocks are marked for erasing which is done only when needed. In Linux, the basic concept of TRIM is called discard. So this patch adds the ability to do batched discards (multiple blocks) allowing the entire file system to be “trimmed” if needed. So far, ext4 is the first file system in Linux to support batched discards.

In addition to these two major new features in ext4, there was a somewhat minor one that is useful nonetheless. In 2.6.37 a patch was added that allows ext4 to list or “advertise” the features of the particular version of ext4 in sysfs. More specifically, there is a “features” directory in /sys/fs/ext4 that advertises what features are available in ext4 in the running kernel. That can be very useful for people wanting to know, or needing to know, exactly what features are available in the particular version of ext4.

Scalability improvements in xfs

Xfs is one of the highest performing file systems in Linux for certain workloads. It is very popular in the HPC (High Performance Computing) crowd because of the excellent file performance, particularly for larger files. However, it has the reputation of not having very good metadata performance. It is still under heavy development and many of the recent patches have been targeting metadata performance.

In the 2.6.37 kernel release, xfs gained some scalability performance improvements. In particular, the scalability of xfs metadata workload performance improved. For example, on an 8-way system, running the fs_mark benchmark for an instance of 50 million files, improved the performance by over 15%. The performance of the removal of those files improved by over 100%.

Of course other improvements and features were added to xfs in 2.6.37. In a previous article I mentioned a new logging option (delayed logging) was added in the 2.6.35 kernel that can greatly improve I/O bandwidth for the log by several orders of magnitude. This can greatly improve metadata performance for really heavy metadata workloads. In 2.6.37, a patch was added that removed the “experimental” label from delayed logging making it production ready.

Other improvements/changes added to xfs in 2.6.37 are:


  • Project quotes to support 32-bit project ids were added
  • XFS_IO_ZERO_RANGE was introduced which is a function that enables files to quickly zero ranges of files without changing the layout of the file in any way
  • The cache hash was converted to use rbtree in this patch. This was done because the buffer cache hash was showing scalability problems. By switching to rbtrees performance the performance and scalability should be greatly improved, particularly for systems doing a great deal of I/O.

Btrfs improvements

Everyone’s favorite file-system-in-development, btrfs, had some interesting patches added in 2.6.37. Overall, if you watch the btrfs mailing list, you will see lots of active testing of btrfs. This has resulted in a number of good patches even if they aren’t adding significant new “features”. Several of the patches can be considered “major” while there are also some very good “minor” patches as well.

Probably the most significant feature added to btrfs is to cache the free space information on disk. It sounds kind of confusing so let me explain. Before this patch, if btrfs had to allocate from a block group that was not previously cached, it had to scan the entire extent-tree (i.e. it took a great deal of time and resources to find available block groups). After this patch, every time a transaction is committed producing a dirtied block group, the free space is dumped to the on-disk free space cache. So finding available block groups is a simple lookup greatly improving performance for this situation.

This patch results in an disk format change for btrfs. Recall that it is still in development so don’t be surprised by any disk format changes. However, you can mount existing btrfs file systems so that this option is not used. In fact, currently, you have to enable this new option using the “-o space_cache” mount option.

Another major feature that was added to btrfs in 2.6.37 was asynchronous snapshot creation. The benefit of this features is that you don’t have to wait for a new snapshot to be committed to the disk. You can use this feature by adding “async” to the “btrfs subvolume snapshot” command.

Believe it or not, the asynchronous snapshot creation capability was added primarily with ceph in mind. Remember that ceph was added a few kernel versions ago and is a distributed parallel file system that is still under heavy development. Ceph uses btrfs as the underlying file system (Ceph can arguably be called a meta file system since it is file system on top of a file system). There is more on Ceph itself later in this article.

A somewhat minor feature that was added to btrfs in the recently released 2.6.37 kernel is the ability to delete sub-volumes by unprivileged users. However, the user can only delete the sub-volume if they have “write” and “execute” permission on the sub-volume root inode. Otherwise they don’t have permission to delete it. The option “-o user_subvol_rm_allowed” can be used during the mounting of btrfs to enable this option.

An additional minor feature was added that switched from extent buffer rbtrees to a radix tree. This switch should reduce CPU time spent in the extent buffer search and improve performance for some operations (see the commit link for more details).

The last feature for btrfs that I want to mention is all around chunk allocation tuning. This particular patch allows data and metadata block groups to be mixed. According to the kernel newbies article on 2.6.37 this should be useful for small storage devices.

Comments on "Linux 2.6.37: Scalability Improvements Abound"

It’s very trouble-free to find out any matter on web
as compared to books, as I found this piece of writing at this
web site.

Wow, amazing blog format! How long have you been blogging for? you make blogging look easy. The total look of your site is wonderful, let alone the content material!

Simply wanna say that this is handy , Thanks for taking your time to write this.

If you haven’t already learned from a really good dog training book or video program how to properly
go about training your dog, it is very likely that you will have a variety
of dog training problems.
Aticle Source: you heard about cheap doog clothing.

It is important to note that telling your dog “no” does not give him
any information.

it just merely puts you in a position where you can comfortably BE.
Have you ever seen some wound or scratch you did not even notice you had and
started to feel its pain only after you noticed it.
But getting help from physiotherapists has become common all over the world nowadays.

It’s remarkable in support of me to have a web site, which is good in support
of my know-how. thanks admin

site (Twyla)

Thhe new Skodxa Rapid is the first entry level sedan car model iin the Indian auto market by company which has beeen most popular before its official launch.

Several automakers are developing fuel-cell cars and others are working on significantly
fuel-efficient conventional models. To wrap things
up, check the oveerall electrical system of the muscle autos.

Good respond in return of this issue with solid arguments and telling the
whole thing on the topic of that.

Besides that if you happen to undergo any dental treatment,
the healing process will be slowed down by your smoking habit.
Thhe article on this website sems to suggest that
e-cigarettes are a good idea because they deliver negligible nicotine to the
smoker. You are allowed to stay puffing off the carcinogenic
substances in traditional tobacco cigarettes, or try this last alternative to
stop smoking without having to stop puffing a stick.

My weblog; electronic cigarettes

The new Skoda Rapid is the first entry level sedan car model in the Indian auto market bby company which haas been most poplar befor its official launch.
Several automakers are developing fuel-cell cars and others are working on significantly fuel-efficient conventional models.
The box did not come with ANY instructions for installing, but
in less than a hour tthe instawllation is completed.

I have recently started a blog, the information you offer on this web site has helped me greatly. Thank you for all of your time & work.

Money cann be big motivation when tryhing to quit smoking.
If you want some more information on this neew cigarette, go to Cigarette electric.
If you are interested in a healthier alternative to smoking, or if youu simply want too have the freedom to smoke wherever and whenever you want,
an electronic cigarette might be the solution you’ve been ooking for.

Feel frree to surf to my web page … electronic cigarettes

Looking forward to reading more. Great article.Much thanks again. Keep writing.

I really like what you guys tend to be up too. This sort of clever work and exposure!

Keep up the superb works guys I’ve incorporated you guys
to my own blogroll.

Or maybe before living in the United States, yoou experienced one firsthand
on foreign soil. Fewer models of cars from fewer carmakers overall lrft
buyers with fewer choices, and with virtually no foreign competition best-sellers typicaally sold in excess
of 1. The bbox did not come with ANY instructions ffor installing, but
in less than a hour thee installation is completed.

At this time it sounds like Expression Engine is the best blogging platform out there right now.
(from what I’ve read) Is that what you’re using on your blog?

webpage (Patty)

all the time i used to read smaller articles that as well clear their motive,
and that is also happening with this post which I am reading at this time.

Some of the packages for the Krave E cigarettes are designhed to
llook like a typical package of cigarettes, and there are some reviews that would indicate that the
four cartridxges might indeed last you as long as the sqme szed pack of typical cigarettes.
Because of its shape and style it is very difficult for nonsmokers,
if they don’t smell the flavours, which is real tobacco cigarette and whhich is e cig.
When you use one, you don’t have to worry about the environment around you becoming contaminated.

Also visit my webpage – electronic cigarettes

hi!,I really like your writing very much! percentage we communicate extra about your post on AOL?

I require a specialist in this area to solve my problem. Maybe
that’s you! Taking a look forward to look you.

I’m extremely inspired with your writing abilities
as smartly as with the layout in your weblog.
Is this a paid theme or did you modify it yourself? Anyway keep up
the nice quality writing, it is rare to look a great
blog like this one nowadays..

Actually no matter if someone doesn’t be aware of
after that its up to other users that they will help, so here it
takes place.

I like looking through an article that will make men and women think.
Also, many thanks for allowing me to comment!

Hello i am kavin, its my first time to commenting anywhere, when i read this
paragraph i thought i could also make comment due to this brilliant paragraph.

I’m really enjoying the design and layout of your site.

It’s a very easy on the eyes which makes it much more
pleasant for me to come here and visit more often. Did you hire
out a developer to create your theme? Fantastic work!

hi!,I really like your writing so so much!

proportion we keep up a correspondence extra about your article on AOL?
I need an expert on this space to resolve my problem.
Maybe that is you! Looking forward to peer you.

My programmer is trying to persuade me to move to .net from PHP.
I have always disliked the idea because of the costs.

But he’s tryiong none the less. I’ve been using Movable-type on numerous
websites for about a year and am worried about switching to
another platform. I have heard good things about blogengine.net.
Is there a way I can transfer all my wordpress posts into
it? Any kind of help would be greatly appreciated!

Hi there to all, it’s actually a good for me
to pay a quick visit this web page, it consists of priceless Information.
webpage – Marilou -

Appreciate the recommendation. Let me try it out.

Hey there! Do you know if they make any plugins to assist with
Search Engine Optimization? I’m trying to get my blog to
rank for some targeted keywords but I’m not seeing very good gains.
If you know of any please share. Kudos!

wow, awesome article.Really thank you! Awesome.

Website worth visiting below you all find the link to some sites that we think you should visit

Wow, this piece of writing is pleasant, my sister is analyzing these things, thus I
am going to tell her.

my web blog – good oil free moisturizer

I do believe all the ideas you’ve presented to your post. They
are really convincing and will definitely work. Nonetheless, the
posts are very quick for newbies. May you please prolong them a little from
subsequent time? Thank you for the post.

I think that everything said was actually very logical.
But, think about this, suppose you wrote a catchier post title?

I mean, I don’t want to tell you how to run your website, but
what if you added a title that grabbed people’s attention? I mean Linux
2.6.37: Scalability Improvements Abound | Linux Magazine is kinda
plain. You ought to peek at Yahoo’s home page and note how
they create post headlines to grab viewers interested.
You might add a related video or a pic or two to grab readers interested
about everything’ve written. Just my opinion, it might make your
website a little bit more interesting.

Hi there, I found your website by means of Google while searching
for a relevant topic, your website came up, it looks excellent.
I have bookmarked it in my google bookmarks.

Also visit my homepage; mouse click the up coming post

Hey would you mind letting me know which hosting company you’re working with?
I’ve loaded your blog in 3 completely different internet browsers and I must say this blog loads a lot faster then most.
Can you suggest a good internet hosting provider at a honest price?
Thanks, I appreciate it!

First off I would like to say fantastic blog! I had a quick question which I’d like to ask if you do not mind.
I was interested to know how you center yourself and clear your mind prior to writing.
I’ve had a tough time clearing my mind in getting my ideas out there.
I do enjoy writing however it just seems like the first 10 to 15 minutes are usually lost
simply just trying to figure out how to begin. Any ideas or hints?
Thanks!

I wanted to thank you for this great read!! I definitely
enjoyed every little bit of it. I have got
you bookmarked to check out new things you post…

Thanks for finally talking about > Linux 2.6.37: Scalability
Improvements Abound | Linux Magazine < Loved it!

Hi i am kavin, its my first time to commenting anyplace, when i read this article i thought
i could also make comment due to this sensible
article.

hello there and thank you for your information – I’ve definitely picked up anything new from right
here. I did however expertise some technical issues using this web site, since I experienced to
reload the web site a lot of times previous to I could get it
to load properly. I had been wondering if your web host is OK?

Not that I am complaining, but slow loading instances times will very frequently affect your placement in google and could damage
your high quality score if advertising and marketing with Adwords.
Well I am adding this RSS to my e-mail and can look out for
much more of your respective interesting content.

Ensure that you update this again very soon.

Hi, I wish for to subscribe for this weblog to get newest updates, so where can i do it please assist.

You are so cool! I do not think I’ve truly read through something like this
before. So great to find another person with some original thoughts on this subject.
Seriously.. thanks for starting this up. This website is one thing that is required on the web, someone with some originality!

I just want to tell you that I’m beginner to blogging and seriously enjoyed this web blog. More than likely I’m likely to bookmark your blog . You certainly come with superb writings. Kudos for revealing your website page.

Leave a Reply