One of the classic problems in designing storage solutions is that we don’t know what kind of IO performance applications need. We may have some empirical knowledge, “switching to SSD’s improved by wall clock time by 25%,” or, “adding an extra drive improved my application performance by 11%.” While this information is helpful to some degree what is missing is the understanding of why performance improved and by extension, what are the driving mechanisms behind the performance improvement. Perhaps equally important for the application developers is that understanding the driving forces of IO performance for your application can be used to improve the IO performance, if needed.
This article is an introduction to using a common Linux system administration tool - strace. Strace output from an application can be used to produce a huge amount of information about the IO behavior of your application.
Strace is your Friend
The goal behind IO profiling is to understand the application not necessarily to understand the storage system itself. The information is to be about the application - how does it perform writes, how does it perform reads, how much lseek happens, what kind of IOPS performance does the application require, etc. However, to obtain this information was can’t divorce ourselves completely from the storage system but we can at least gain something beyond an empirical knowledge.
The tool that we’re going to use is strace. This tool is commonly used for understanding .so libraries that are added during runtime as well as gathering other debugging information about misbehaving applications. What strace does is trace system calls and signals. Since the system is doing the IO strace can be used to trace what is happening with IO syscalls calls. Moreover, you can use strace to gather additional information about the system calls (syscall) such as how much time was spent performing the syscall. All of this information can be used to profile an application’s IO patterns.
Strace has a large number of options (just run “man strace”). Some of them are relevant to this discussion and most are not. The specific options used in this article are:
% strace -T -ttt -o strace.out
The “-T” option cause strace to print out the elapsed time of the system call at the end of the line in the form, <0.000197>. This is amount of time used to complete the syscall. The option “-ttt” causes microsecond timing but does so in seconds since the epoch (makes life easier for applications that run a long time). The last option, “-o”, sends the output to a file - in this case “strace.out.”
Quick strace example
Let’s look at a quick example of using strace. Here is a simple C code that creates an array then dumps it to a file using the function fwrite().
#include
#define BUFSIZE 100000
int main(int argc, int *argv[] )
{
int i;
float buf[BUFSIZE];
char filename[128];
FILE *myfile;
for (i=0; i < BUFSIZE; i++)
buf[i] = 2.5 + BUFSIZE * i;
sprintf(filename, "testfile");
myfile = fopen(filename, "w");
fwrite(buf, sizeof(float), BUFSIZE, myfile);
fclose(myfile);
}
The code takes an array of float variables and dumps the entire array to a file (”testfile”) using the fwrite function.
The code is built using gcc 4.3.2 (Ubuntu 4.3.2-lubuntul2) with no options. The strace output using the options “-T -ttt” is the following:
1267394331.467245 execve("./test1c", ["./test1c"], [/* 37 vars */]) = 0 <0.000211>
1267394331.467773 brk(0) = 0×951e000 <0.000012>
1267394331.467862 access(”/etc/ld.so.nohwcap”, F_OK) = -1 ENOENT (No such file or directory) <0.000016>
1267394331.467973 mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb80d8000 <0.000014>
1267394331.468055 access(”/etc/ld.so.preload”, R_OK) = -1 ENOENT (No such file or directory) <0.000016>
1267394331.468137 open(”/etc/ld.so.cache”, O_RDONLY) = 3 <0.000017>
1267394331.468212 fstat64(3, {st_mode=S_IFREG|0644, st_size=121820, …}) = 0 <0.000012>
1267394331.468349 mmap2(NULL, 121820, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb80ba000 <0.000014>
1267394331.468405 close(3) = 0 <0.000011>
1267394331.468480 access(”/etc/ld.so.nohwcap”, F_OK) = -1 ENOENT (No such file or directory) <0.000016>
1267394331.468573 open(”/lib/tls/i686/cmov/libc.so.6″, O_RDONLY) = 3 <0.000021>
1267394331.468655 read(3, “177ELF111 3 3 1 340g1″…, 512) = 512 <0.000014>
1267394331.468750 fstat64(3, {st_mode=S_IFREG|0755, st_size=1425800, …}) = 0 <0.000012>
1267394331.468873 mmap2(NULL, 1431152, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7f5c000 <0.000014>
1267394331.468934 mmap2(0xb80b4000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0×158) = 0xb80b4000 <0.000022>
1267394331.469009 mmap2(0xb80b7000, 9840, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb80b7000 <0.000017>
1267394331.469078 close(3) = 0 <0.000011>
1267394331.469152 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f5b000 <0.000014>
1267394331.469215 set_thread_area({entry_number:-1 -> 6, base_addr:0xb7f5b6b0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0 <0.000013>
1267394331.469382 mprotect(0xb80b4000, 8192, PROT_READ) = 0 <0.000016>
1267394331.469441 mprotect(0×8049000, 4096, PROT_READ) = 0 <0.000019>
1267394331.469506 mprotect(0xb80f5000, 4096, PROT_READ) = 0 <0.000014>
1267394331.469560 munmap(0xb80ba000, 121820) = 0 <0.000021>
1267394331.471100 brk(0) = 0×951e000 <0.000012>
1267394331.471153 brk(0×953f000) = 0×953f000 <0.000013>
1267394331.471224 open(”testfile”, O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3 <0.000035>
1267394331.471325 fstat64(3, {st_mode=S_IFREG|0644, st_size=0, …}) = 0 <0.000013>
1267394331.471446 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb80d7000 <0.000013>
1267394331.471509 write(3, “