Wednesday 27 October 2010

Quick and dirty scripting with Python

Quick and dirty scripting with Python


I came home from work today and, instead of taking a nap (something that would have come in really handy) or playing Mass Effect 2, I started coding a couple of scripts to help me organising some files that I've got in several Micro SD cards.

Keep in mind that I've omitted the imports, "const-like" vars declarations and the if __name__=="__main__" statement for the sake of brevity.

The first one just lists the files and subdirectories from a given path, and then it saves it to a text file. Python also provides the function os.walk (or the deprecated os.path.walk), but I got it into my head that I wanted to indent the files according to the depth from the root path and the easiest way to accomplish this was using os.listdir, which worked as a charm =)

def extract_dir_files(arg, dirname):

    dir_str= "%s/%s/\n" % (" "*arg[0],os.path.split(dirname)[1])

    arg[1].write(dir_str)

#os.listdir just returns a list containing all files and directories #contained within the path 'dirname'
    for item in os.listdir(dirname):
        new_path = os.path.join(dirname,item)
        args = [arg[0]+1,arg[1]]
        if os.path.isdir(new_path):

#The child directories will make recursive calls
#until the root_path directory has been completely #parsed
            extract_dir_files(args,new_path)
        else:
            name,ext = os.path.splitext(item)
            #do not write out the files having one of the
#extensions specified in IGNORE_EXT
            if ext not in IGNORE_EXT:
                file_str = "%s-%s\n" % (" "*(arg[0]+1),item)
                arg[1].write(file_str)
                print(file_str)    


def main():
    root_path = os.path.join(DRIVE_LETTER,ROOT_PATH)
    with open(OUTPUT_FILE, 'a') as f:
        f.write("*"*PAD_NUM+"\n")
#args[0] will hold the value of the current depth in the dir. #hierarchy; args[1], the file object
        args = [0,f]
        extract_dir_files(args,root_path)
        f.write("*"*PAD_NUM+"\n")




The second one gets the HTML contents of a given URL which are parsed to a DOM-like object. After that, all that it's left is to extract the data of interest and do whatever you want. In my case, I decided to write them to a *.CSV file for future use. I used the library available at http://www.boddie.org.uk/python/libxml2dom.html

It was pretty easy to use, and it allowed for DOM-like processing of HTML pages, which aren't generally well-formed. This causes complaints and exceptions when using better known Python DOM libraries.

def main():
#URL returns a list of links, of which I'll only be
#interested in their text values.
#FROM and TO are the
values I want to pass to the query
#as GET parameters.

    with contextlib.closing(urllib.urlopen(URL%(FROM,TO))) as url:
        encoding = url.headers['Content-type'].split('charset=')[1]
        contents = url.read().decode(encoding).encode('utf-8')
    doc = libxml2dom.parseString(contents,html=1)
    links = doc.getElementsByTagName("a")
    results = []
    for l in links:
        release = l.childNodes[0].nodeValue
        #...then do whatever you want with it :P



No comments:

Post a Comment