Checking Your Bookmarks
Randal L. Schwartz
Like most people, I've bookmarked about a third of the known Internet by now.
Of course, sites go away, and URLs become invalid, so some of my lesser-used
bookmarks are pointing off into 404-land.
Some browsers have an option to periodically revalidate bookmarks. My favorite
browser lacks such a feature, but it does include the ability to export
an HTML file of all the bookmarks and reimport a similar file in a way that
can be easily merged back into my existing bookmark setup. So, I thought I'd
take a whack at a Perl-based bookmark validator, especially one that worked
in parallel so that I could get through my bookmark list fairly quickly. The
result is in Listing 1, below.
Lines 1 through 3 declare the program as a Perl program and turn on the compiler
restrictions and warnings as good programming practice.
Lines 5 through 7 pull in three modules that are found in the CPAN. The HTML::Parser
module enables my program to cleanly parse HTML with all its intricacies. The
LWP::Parallel::UserAgent module provides a means to fetch many Web pages
at once. And finally, HTTP::Request::Common sets up an HTTP::Request
object so that I can fetch it with the user agent.
Lines 9 and 10 set up the user interface for this program. I can use the program
as a filter:
./this_program <Bookmarks.html >NewBookmarks.html
or as an in-place editor:
./this_program Bookmarks.html
As an in-place editor, the Bookmarks.html file will be renamed to Bookmarks.html~
(with an appended tilde), and the new version will appear at the original name.
Lines 11 to 19 edit each file (usually just one) in turn, or the standard
input as one file.
|