Trivial Work: 12/2/07

Ok, so I am new to Perl. I confess. Normally I'd say "I am a bash guy!" and sort of brush it off. But today I hit the limits. Task at hand: "Compare two files with lists of md5sums".
"diff" does not quit ecut the biscuit, as it is to unstructured. So I want to take the first md5sum from the second file and remove it from the original. In that way the remaining entries are the ones different in the second file.
So in bash this spells out:

while read line
do
        # isolate md5sum from line:
       md5=$(echo $line| awk '{print $1}')
       # Is this md5 in the second file ?
       if grep -q "$md5" RESTORE-sorted.txt
       then
                # If so, throw it out, we don't consider it anymore:
               grep -v "$md5" RESTORE-sorted.txt > RESTORE.mv
               mv RESTORE.mv RESTORE-sorted.txt
       fi

done

Pretty short and sweet. However, it runs forever. on a 650M file. Something to do with the kernels handling of file-descriptors. I started it 6 hours ago and it has not even done half of the task. In fact, while it was running I was able to pick up the necessary Perl to accomplish the same, using arrays. (Perl is quite "intuitive", you can sort of "baby-talk" your way into it) The prog is not quite as short and sweet, but that is probably due to my newbieness. However it takes 10 seconds to run. Well, does illustrate a point, does it not...

#!/usr/bin/perl

# Read original file and checksums into array:
$orig_file="ORIG-sorted.txt";
open(ORIG, $orig_file) || die("Could not open file!");
while ()
{
 ($key,$value) = split(/ +/,$_);
 $orig_a{$key} = $value;
}
close(ORIG);

# Open the next file:
$restore_file="RESTORE-sorted.txt";
open(RESTORE, "<$restore_file") || die("Could not open file!");
while(<>)
{
       # Split the line in two parts...
       my($line) = $_;
       @record = split(/ +/,$line);
       ##...and delete line containing md5sum from original array:
       ##(the central task)
       delete $orig_a{"$record[0]"};


};
close(ORIG);

# print out formatted array:
foreach $key (keys %orig_a)
{
       print $key , " " , $orig_a{$key} ;
}

Trivial Work

Monday, December 03, 2007

Bash vs Perl in Administration

Sunday, December 02, 2007

Re-activating Blog Activity

Blog Archive