Showing posts with label scripting. Show all posts
Showing posts with label scripting. Show all posts

Wednesday, 27 October 2010

Quick and dirty scripting with Python

Quick and dirty scripting with Python


I came home from work today and, instead of taking a nap (something that would have come in really handy) or playing Mass Effect 2, I started coding a couple of scripts to help me organising some files that I've got in several Micro SD cards.

Keep in mind that I've omitted the imports, "const-like" vars declarations and the if __name__=="__main__" statement for the sake of brevity.

The first one just lists the files and subdirectories from a given path, and then it saves it to a text file. Python also provides the function os.walk (or the deprecated os.path.walk), but I got it into my head that I wanted to indent the files according to the depth from the root path and the easiest way to accomplish this was using os.listdir, which worked as a charm =)

def extract_dir_files(arg, dirname):

    dir_str= "%s/%s/\n" % (" "*arg[0],os.path.split(dirname)[1])

    arg[1].write(dir_str)

#os.listdir just returns a list containing all files and directories #contained within the path 'dirname'
    for item in os.listdir(dirname):
        new_path = os.path.join(dirname,item)
        args = [arg[0]+1,arg[1]]
        if os.path.isdir(new_path):

#The child directories will make recursive calls
#until the root_path directory has been completely #parsed
            extract_dir_files(args,new_path)
        else:
            name,ext = os.path.splitext(item)
            #do not write out the files having one of the
#extensions specified in IGNORE_EXT
            if ext not in IGNORE_EXT:
                file_str = "%s-%s\n" % (" "*(arg[0]+1),item)
                arg[1].write(file_str)
                print(file_str)    


def main():
    root_path = os.path.join(DRIVE_LETTER,ROOT_PATH)
    with open(OUTPUT_FILE, 'a') as f:
        f.write("*"*PAD_NUM+"\n")
#args[0] will hold the value of the current depth in the dir. #hierarchy; args[1], the file object
        args = [0,f]
        extract_dir_files(args,root_path)
        f.write("*"*PAD_NUM+"\n")




The second one gets the HTML contents of a given URL which are parsed to a DOM-like object. After that, all that it's left is to extract the data of interest and do whatever you want. In my case, I decided to write them to a *.CSV file for future use. I used the library available at http://www.boddie.org.uk/python/libxml2dom.html

It was pretty easy to use, and it allowed for DOM-like processing of HTML pages, which aren't generally well-formed. This causes complaints and exceptions when using better known Python DOM libraries.

def main():
#URL returns a list of links, of which I'll only be
#interested in their text values.
#FROM and TO are the
values I want to pass to the query
#as GET parameters.

    with contextlib.closing(urllib.urlopen(URL%(FROM,TO))) as url:
        encoding = url.headers['Content-type'].split('charset=')[1]
        contents = url.read().decode(encoding).encode('utf-8')
    doc = libxml2dom.parseString(contents,html=1)
    links = doc.getElementsByTagName("a")
    results = []
    for l in links:
        release = l.childNodes[0].nodeValue
        #...then do whatever you want with it :P