Archive for the ‘perl’ Category
PERL script to back up your website using your RSS feed
This is a very simple PERL script that grabs your RSS feed, pulls the link for each page, downloads the text for that page and writes the HTML page to your computer’s hard drive. It creates a separate directory by year and month and stores each HTML page in the directory for the month it was published.
So for this website it would create a root directory ‘herselfswebtools.com’ and directory under that for ’2007′ and under 2007 it would create directories for ’01′, ’02′, ’03′, ’04′, ’05′, ’06′, ’07′. The full page including CSS, sidebars, etc will then be written in the proper months directory. As of now it does not download and save images.
This first script is intended to be general and able to back up any rss feed website. ( There are two scripts for blogger on the sidebar as well, details coming on them Monday and Wednesday. Or you can just download they and read the notes in the scripts. )
There are two things you’ll need to change both on this line (63):
$content = get ( “http://www.blogger.com/feeds/9999999999999/posts/default?max-results=500&alt=rss”);
You need to change that series of 99999s to your blog id number and if you have more than 500 posts you’ll want to make that a larger number. Or if you are backing up a non blogger website you should just be able to use the rss feed for that site.
Backup Blogger Posts Perl script
You might need to install a Perl module or two. Just follow the directions if you are not familiar with how to do so.
Installing PERL modules
I’m going to be writing some more PERL scripts to make website maintenance easier. Today I hunted down some information on RSS feeds and using PERL to download them.
Ah but since I had not done much with PERL on this computer I got caught in the seemingly endless ‘Can’t locate BlahBlah/Blah.pm’ Egads. For every one I found and installed another one needed to be hunted down.
The very easiest way to do this and retain your sanity is to use the CPAN module that comes with PERL.
As root so you have permission to install modules in the Library Path type:
perl -MCPAN -e shell;
It will ask if you want to do the manual configuration. Hit enter and for almost every question you’ll just be able to hit enter and agree with the option it chose. There’ll be a couple you have to give it response other than enter so pay attention as you go through the questions.
Once that is done you’ll be dropped into a cpan shell
cpan>
Now all you have to do is try to run your program in one terminal window and install missing modules in the cpan window. For example:
./rss2html.pl
Can’t locate LWP/Simple.pm in @INC blah blah blah
So in the cpan window type
cpan> install LWP::Simple
Just replace slashes with :: and drop the .pm. It will locate, compile and install the module for you. Occasionally you will have to force a module.
cpan> force install LWP::Simple
Perl is case sensitive. This is the most painless and easiest way to collect all the modules you will need to run PERL scripts.
* You will probably need to install a module or two to use the Perl Blogger backup scripts I posted in the top left sidebar last weekend.
More information:
Using RSS News Feeds