Chewing on Progress Bars

Implement a nifty progress bar with a handful of modules and a smattering of code.
If you’re like me, you spend a lot of time getting things over your Internet connection, downloading files to your desktop machine (or in my case, my laptop, which is my only machine). Indeed, one of the things I find myself doing frequently is watching the output of curl, as it keeps me up-to-date on how much has been downloaded and how much is left to do.
I recently stumbled across Term::ProgressBar. This CPAN module can draw a labeled progress bar to show how much of a task has been completed. The bar has a major part drawn with nice = characters, labeled by percentage, and a little flying * that shows percentage complete within each one of the = steps. The bar is drawn in such a way that successive invocations overwrite the previous one, creating the illusion that the bar “grows” in tandem with progress.
One of the nice features of Term::ProgressBar is that it notes the times when each update is called, which can yield an estimate of how many hours, minutes, and seconds are left until the monitored task completes. This is done automatically without any work on the caller’s part, except for requesting the option.
It was this particular feature that had me think that I could emulate what curl does during a download with a nice little progress bar. I knew I could hook those values of the download-in-progress with a LWP::UserAgent content callback, and the result is Listing One.
LISTING ONE: Realizing a progress bar
01#!/usr/bin/perl -w
02use strict;
04use Term::ProgressBar;
05use URI;
06use LWP::UserAgent;
08my $ua = LWP::UserAgent->new;
10while (@ARGV) {
11 my $url = shift;
12 print “$url:\n”;
14 my $uri = URI->new($url);
15 my $path = $uri->path;
16 $path =~ s{.*/}{};
17 $path = “download” unless length $path;
18 $path = “X$path” while -e $path;
20 open my $outhandle, “>”, $path or die “Cannot create $path: $!”;
22 my $bar = Term::ProgressBar->new({ name => ’Download’,
23 count => 1024,
24 ETA => ’linear’});
26 my $output = 0;
27 my $target_is_set = 0;
28 my $next_so_far = 0;
29 $ua->get
30 ($url,
31 “:content_cb” => sub {
32 my ($chunk, $response, $protocol) = @_;
34 unless ($target_is_set) {
35 if (my $cl = $response->content_length) {
36 $bar->target($cl);
37 $target_is_set = 1;
38 } else {
39 $bar->target($output + 2 * length $chunk);
40 }
41 }
43 $output += length $chunk;
44 print {$outhandle} $chunk;
46 if ($output >= $next_so_far) {
47 $next_so_far = $bar->update($output);
48 }
50 });
52 $bar->target($output);
53 $bar->update($output);
Line 1 declares my path to Perl, and enables warnings throughout the program. I still use –w instead of use warnings, mostly because I’m lazy and habitual. The problem with –w, though, is that it enables warnings globally, even for code I didn’t write or test. With use warnings, only the files (or smaller lexical scope) in which it appears have warnings enabled.
Line 2 enables the standard Perl restrictions, which disable arbitrary barewords and symbolic references and require simple variables to be declared lexically.
Lines 4-6 pull in the three modules I’ve installed from CPAN. Term::ProgressBar provides the progress bar mentioned at the outset, while URI and LWP::UserAgent are part of Bundle::LWP, the useful collection of modules that deal with everything about the Web, except for CGI.
Line 8 creates my virtual user agent in $ua, acting as a client for HTTP transactions. I use this user agent object as I might use a browser, telling it to fetch a particular URL. Various configuration options exist for a LWP::UserAgent object, such as what kind of browser it tells the server it might be; however, I’ve left all the settings at the default because, yes, I’m lazy. (Some Web servers care about the browser identification. In some cases, I might want to go back and reconfigure this user agent to pretend it’s a certain version of Internet Explorer or Firefox to access certain “restricted pages.” Yes, the server trusts an arbitrary string sent by the browser and some sites use that string to control access. How silly.)
Lines 10-55 loop once for each URL specified on the command-line. The loop exits when @ARGV is finally empty, which happens eventually, because the first item of @ARGV is shifted off into $url in Line 11. Line 12 prints the current URL.
Lines 14-18 try to figure out a suitable local filename for the downloaded information. I want to emulate curl –O by taking the last component of the path as the name, so I leverage the URI module to do the parsing. Line 14 creates a URI object from the requested URL. Line 15 grabs just the path part of the URL — that’s the section after the host, but before the optional query string. Line 16 removes everything from the path up to the final (or only) slash.
At this point, $path is a candidate for a filename. However, if it’s empty (the path ends in slash, for example), I force it to be download instead in Line 17. And finally, because I want it to be a new file, I just add X in front of the name until the file doesn’t exist locally in Line 18. (Yes, that’s a pretty hokey chunk of code, but it was good enough for the few samples I tested it with.)
Once I have a filename, Line 20 opens a handle to that file, using a lexical filehandle and the three-argument open, which works fine in modern versions of Perl (but probably won’t work if you haven’t upgraded Perl since 1998).
Lines 22-24 create the progress bar object. I set the label to Download, which seems appropriate at this point, along with an initial guess at the total size as 1,024 bytes. Later, I update this amount with either a better guess or with the actual byte count as reported by the server. Finally, I also enable estimated-time mode, using linear approximation (the only choice possible).
Line 26 establishes $output. I use $output to count the bytes downloaded so far, so I’ll start with 0.
Line 27 defines a boolean flag, $target_is_set, initially false. When I’ve seen a good length from the server, I use it as the final target value for upper bound and set $target_is_set to true. This keeps me from having to repeatedly check for the value on each iteration, which seems wasteful.
Line 28 holds the number of bytes I should see before updating the bar again. On each bar update, I’m told how long to wait before a half-second would have passed in terms of bytes downloaded. By paying attention to this value, I can optimize the number of calls I make to update the bar.
Lines 29-50 perform the download, calling the get() method against the user agent. Line 30 defines the desired URL for this request.
Lines 31-50 define a content callback. Normally, as the LWP::UserAgent object fetches the reply, the “content” is loaded into the object and is available only when the entire response has been seen (by calling the content method on the response object). However, I can define a callback subroutine to be called as each chunk is observed from the server.
In this case, as each chunk is observed, I get a call to the subroutine beginning in Line 31 (an anonymous subroutine). The subroutine is passed three values: the chunk of data that’s been read ($chunk in Line 32), the response object as constructed so far ($response), and the protocol handler object ($protocol). I’m not using the $protocol object at all, but the other two are very important.
Lines 34-41 attempt to update the total bytes target, unless that’s already been done this for this download (noted because $target_is_set is set). Line 35 reaches into the response object, looking for the content_length header in the web server’s response. If that’s provided, I can get a clearer idea of percentage of text seen so far.
If known, the content length is set as the target in Line 36, noting that in Line 37. However, for many downloads (especially those that are created dynamically), the server has no idea how many bytes it will eventually send. So, in Line 39, I fake up a target that is everything seen so far, plus the chunk just seen, plus perhaps one more chunk just like it. It’s wrong, but there’s no right value anyway — the bar simply hovers, showing “almost there.”
Once I’ve updated the target, it’s time to actually write the data that’s been seen. I update the total bytes seen so far in Line 43, and then print the data to the handle in Line 44. Actually, I could eliminate the $output variable by using –s on the filehandle every time I need it, since those numbers should be the same. However, that would be making an operating system request repeatedly for information that I can easily calculate, so why not just calculate it?
Lines 46-48 update the bar. Initially, $next_so_far is 0, so I call this method on the first chunk of data I see — that draws the initial bar with the initial guess of maximum bytes (possibly an accurate value directly from the server), and leaves room for a “time remaining” value that will be updated after a few more calls. The return value from the update will modify $next_so_far, giving the suggestion to not call update() again until I’ve seen that many bytes. As mentioned earlier, this is an optimization so that the bar is updated roughly every half-second based on calls made seen previously for this progress bar. I could completely ignore this value, and just call update() on each chunk. The result would be similar, although a lot more output would be generated.
Once the download is complete, the call to get() in Line 29 returns, and I move on with the next step of the program. I want the bar to read “100% downloaded” when I’m done. I know the total length in $output, so I call target in Line 52 to say, “Yes, this number of bytes is 100% ”. I also call update() to say, “Yes, I’ve seen exactly this many bytes” in Line 53. And I’m done with that file!
So there you have it: emulating curl’s download time and percentage using Term::ProgressBar.
Hopefully, you’ve seen enough to add progress bars to your own applications. Also, check out CPAN’s Tk::ProgressBar and CGI::ProgressBar for graphic and Web-based applications, respectively, and Smart::Comments for automatically adding progress bars to your loops.
Until next time, enjoy!

Randal Schwartz is the chief Perl guru at Stonehenge Consulting. You can reach Randal at class="emailaddress">merlyn@stonehenge.com.

Comments are closed.