Do you have a brand new SSD? Do you plan to partition it? Let's talk about the best way to set up your SSD so partitions -- and the resulting file systems -- align on page boundaries, thus improving performance and minimizing the number of rewrite cycles.
I happen to live in a city with a MicroCenter store and I just bought a new 64GB SSD that uses a SandForce 1222 controller. I’ve been interested in testing the real-time data compression of the SandForce controller on a number of benchmarks and applications. So I finally have one! But before I jump into testing I need to think about configuring the SSD.
The challenge we face is that partitions happen on cylinder boundaries (remember that fdisk in Linux uses “heads” and “tracks” to define cylinders). If this cylinder boundary is not aligned with the “page” of an SSD, then the SSD can easily undergo extra work during a read/modify/write cycle, perhaps causing extra write cycles to be used and performance to be reduced. If you aren’t going to partition your SSD then you don’t have to worry about this too much although it definitely doesn’t hurt.
By default, Linux fdisk uses a default geometry of 255 heads, 63 sectors/track, and,. still currently, 512-byte sectors. This results in 16,065, 512-byte sectors per cylinder (2,008.125, 4KB pages). This is definitely not aligned on 4KB pages. So we need to adjust the geometry to align the cylinder boundaries on 4KB pages so that any partitions are aligned on page boundaries.
If you look around the web a little, you can find some geometry recommendations for various SSDs. For example, Theodore Ts’o, the ext4 leader, has a blog about this very subject. His recommendation is the following:
224 heads (32*7)
56 sectors per track (8*7)
This results in 12,544 sectors per cylinder (256 * 49). Using 56 sectors per track gives 56*512 bytes or 28,762 bytes per track. This is the same as seven blocks of 4KB in each cylinder so you have an integer number of 4KB pages per cylinder. Therefore any partition will be aligned. Below is an example of how you do this.
[root@test64 ~]# fdisk -H 224 -S 56 /dev/sdd
The number of cylinders for this disk is set to 9345.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-9345, default 1): 2
Last cylinder or +size or +sizeM or +sizeK (2-9345, default 9345):
Using default value 9345
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
Notice that I started on the second cylinder to make sure that the partition /dev/sdd1 starts on a cylinder boundary.
I can check the partitioning using the “-l” option with fdisk.
[root@test64 ~]# fdisk -l /dev/sdd
Disk /dev/sdd: 60.0 GB, 60022480896 bytes
224 heads, 56 sectors/track, 9345 cylinders
Units = cylinders of 12544 * 512 = 6422528 bytes
Device Boot Start End Blocks Id System
/dev/sdd1 2 9345 58605568 83 Linux
We can also look at the number of sectors using the “-lu” option.
[root@test64 ~]# fdisk -lu /dev/sdd
Disk /dev/sdd: 60.0 GB, 60022480896 bytes
224 heads, 56 sectors/track, 9345 cylinders, total 117231408 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
/dev/sdd1 12544 117223679 58605568 83 Linux
The partition starts on sector 12,544 (256 * 49) and ends at the end of the device.
An alternative offered on the OCZ Technology Forum site uses a slightly different geometry.
32 heads
32 sectors per track
The result is 1,024 sectors per cylinder (32 * 32). With 512-byte sectors this results in 512KB cylinders which is aligned on 4KB page boundaries (128 pages per cylinder). An example of this on the same device (/dev/sdd) is below.
[root@test64 ~]# fdisk -H 32 -S 32 /dev/sdd
The number of cylinders for this disk is set to 114483.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-114483, default 1): 2
Last cylinder or +size or +sizeM or +sizeK (2-114483, default 114483):
Using default value 114483
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
The partitioning can be checked by using the “-l” option with fdisk.
[root@test64 ~]# fdisk -l /dev/sdd
Disk /dev/sdd: 60.0 GB, 60022480896 bytes
32 heads, 32 sectors/track, 114483 cylinders
Units = cylinders of 1024 * 512 = 524288 bytes
Device Boot Start End Blocks Id System
/dev/sdd1 2 114483 58614784 83 Linux
Notice that the “Units” for this geometry is 512KB (524,288 Bytes) and it has a much larger number of cylinders than the first example. We can also look at the sector layout using the “-lu” option with fdisk.
[root@test64 ~]# fdisk -lu /dev/sdd
Disk /dev/sdd: 60.0 GB, 60022480896 bytes
32 heads, 32 sectors/track, 114483 cylinders, total 117231408 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
/dev/sdd1 1024 117230591 58614784 83 Linux
Notice that we start on sector 1,024 which, using 512-Byte sectors, means that the partition is aligned with 512KB.
Which option is better? I think that depends upon a number of factors especially how the SSD is constructed and how the firmware works. If you aren’t going to partition your SSD, then you don’t need to worry about these steps – just use the whole device, /dev/sdd for example, and you should be fine. But if you are going to partition, the device, then you might want to consider one of these two options to make sure the partitions are aligned on good boundaries for performance and longevity.
Note: For the genesis of the dry-erase marker abuse, click here.
Jeff Layton is an Enterprise Technologist for HPC at Dell. He can be found lounging around at a nearby Frys enjoying the coffee and waiting for sales (but never during working hours).
Comments on "Aligning SSD Partitions"
zwoop
As a side note, if you use the entire SDD for one file system (e.g. use /dev/sda instead of /dev/sd1), you obviously don’t need to partition the disk. And Since /dev/sda is aligned by definition, you don’t have to worry about any of this. This is generally not an option for a boot disk, but if you have one SSD dedicated for /home, by all means use the entire disk, and don’t worry about the magic numbers.
In the video, Jeff states that it is the blocks that get rewritten and that there is a performance hit for having to writes crossing the block boundary, then goes on to try to align on a page…. huh? If you page align (not on a block boundary), then it seems to me that there is a much higher probability of having a multi-page write that will cross the block boundary, than if you were block aligned.
It should apply to pen drives as well since they are constructed in the same manner. But pen drives are fairly slow sometimes because of the drive and sometimes because of the interface (e.g. USB). However, since they are built in the same way, it’s always worth trying (never hurts).
Informative video, but surely when updating a bit, the level of granularity is a page and not a block of them?
Writing 512kb instead of a 4k page to change one bit seems really crazy to me.
(I’m still waiting for SSDs that have unlimited write cycles and none of this wear levelling stuff).
It’s funny but erases are still based on blocks. The basis for this is in the chip design (which I don’t know anything about). It seems like chips could be redesigned but I think this will also cause a big ripple in the whole design of the SSD.
I think the only way to get SSDs that have unlimited write cycles is to change the underlying technology. Not sure what it will be but I think you have to get away from the NAND tunneling concepts.
The article makes a small transposition error. It states “Using 56 sectors per track gives 56*512 bytes or 28,762 bytes per track.” My calculator disagrees: 56*512=28672.
Wouldn’t we all be better off if fdisk just used hex?
I dont agree at all. In SSD there is a controller which is doing wear levelling and a FTL flash translation layer work in which the logical address from file system map to physical address of SSD. Its an illusion for filesystem. The actual phtsical memory of SSD is usually greator than logical memory as the FTL algo needs more space to do wear levelling.
I’m planning on building a system with an SSD for the OS (Ubuntu). Would it work if I first use FDISK on it as described and then let Ubuntu install? (And preferably without custom settings or would those undo the allignment I made with FDISK)?
Also, could you maybe comment on eekamran’s comment about the SSD’s controller making these instructions unnecessary? (At least that’s what I think he’s saying).
Thank you (I’m a newbie).
time for another dumb question: if I use the graphical “disk utility” that comes with CentOS 6 and use the setting to align partitions by MiB, wouldn’t that take care of aligning things on the SSD and eliminate having to worry about all of the exotic geometry calculations?
Comments on "Aligning SSD Partitions"
As a side note, if you use the entire SDD for one file system (e.g. use /dev/sda instead of /dev/sd1), you obviously don’t need to partition the disk. And Since /dev/sda is aligned by definition, you don’t have to worry about any of this. This is generally not an option for a boot disk, but if you have one SSD dedicated for /home, by all means use the entire disk, and don’t worry about the magic numbers.
In the video, Jeff states that it is the blocks that get rewritten and that there is a performance hit for having to writes crossing the block boundary, then goes on to try to align on a page…. huh? If you page align (not on a block boundary), then it seems to me that there is a much higher probability of having a multi-page write that will cross the block boundary, than if you were block aligned.
I think OCZ got it right.
@mbergandi,
I explicitly say that if you don’t partition on page boundaries you can get a performance hit because of pages going across block boundaries.
Jeff
Sorry if my question is dummy but is this rules also applies to Pen Drives?
@josir,
It should apply to pen drives as well since they are constructed in the same manner. But pen drives are fairly slow sometimes because of the drive and sometimes because of the interface (e.g. USB). However, since they are built in the same way, it’s always worth trying (never hurts).
Good luck! (and that was a dumb question).
Jeff
Oh boy do I owe @josir an apology. I meant to say that was NOT a dumb question. I didn’t check my response before I posted.
Once again – my apologies to josir. Your question was definitely not dumb – it was an excellent question.
Jeff
Informative video, but surely when updating a bit, the level of granularity is a page and not a block of them?
Writing 512kb instead of a 4k page to change one bit seems really crazy to me.
(I’m still waiting for SSDs that have unlimited write cycles and none of this wear levelling stuff).
It’s funny but erases are still based on blocks. The basis for this is in the chip design (which I don’t know anything about). It seems like chips could be redesigned but I think this will also cause a big ripple in the whole design of the SSD.
I think the only way to get SSDs that have unlimited write cycles is to change the underlying technology. Not sure what it will be but I think you have to get away from the NAND tunneling concepts.
Jeff
The article makes a small transposition error. It states “Using 56 sectors per track gives 56*512 bytes or 28,762 bytes per track.” My calculator disagrees: 56*512=28672.
Wouldn’t we all be better off if fdisk just used hex?
Ted T’so’s blog has moved. The SSD post is now at http://tytso.livejournal.com/2009/02/20/.
Thanks for the heads-up @cibwaknoy. We updated the link.
I dont agree at all. In SSD there is a controller which is doing wear levelling and a FTL flash translation layer work in which the logical address from file system map to physical address of SSD. Its an illusion for filesystem. The actual phtsical memory of SSD is usually greator than logical memory as the FTL algo needs more space to do wear levelling.
If I take 4 SSD disks and stripe them with RAID0 will the disks be aligned? I keep my OS on sda1, but build a RAID for scratch I/O with:
mdadm –create –verbose /dev/md0 –level=raid0 –raid-devices=4 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
Each file system sdb1 .. sde1 were just created starting at the first cylinder.
I’m planning on building a system with an SSD for the OS (Ubuntu). Would it work if I first use FDISK on it as described and then let Ubuntu install? (And preferably without custom settings or would those undo the allignment I made with FDISK)?
Also, could you maybe comment on eekamran’s comment about the SSD’s controller making these instructions unnecessary? (At least that’s what I think he’s saying).
Thank you (I’m a newbie).
time for another dumb question: if I use the graphical “disk utility” that comes with CentOS 6 and use the setting to align partitions by MiB, wouldn’t that take care of aligning things on the SSD and eliminate having to worry about all of the exotic geometry calculations?