I came home from work today and, instead of taking a nap (something that would have come in really handy) or playing Mass Effect 2, I started coding a couple of scripts to help me organising some files that I've got in several Micro SD cards.
Keep in mind that I've omitted the imports, "const-like" vars declarations and the if __name__=="__main__" statement for the sake of brevity.
The first one just lists the files and subdirectories from a given path, and then it saves it to a text file. Python also provides the function os.walk (or the deprecated os.path.walk), but I got it into my head that I wanted to indent the files according to the depth from the root path and the easiest way to accomplish this was using os.listdir, which worked as a charm =)
def extract_dir_files(arg, dirname): dir_str= "%s/%s/\n" % (" "*arg[0],os.path.split(dirname)[1]) arg[1].write(dir_str)
#os.listdir just returns a list containing all files and directories #contained within the path 'dirname'
for item in os.listdir(dirname):new_path = os.path.join(dirname,item) args = [arg[0]+1,arg[1]] if os.path.isdir(new_path):
#The child directories will make recursive calls
extract_dir_files(args,new_path)#until the root_path directory has been completely #parsed else: name,ext = os.path.splitext(item) #do not write out the files having one of the
#extensions specified in IGNORE_EXT
if ext not in IGNORE_EXT:file_str = "%s-%s\n" % (" "*(arg[0]+1),item) arg[1].write(file_str) print(file_str) def main(): root_path = os.path.join(DRIVE_LETTER,ROOT_PATH) with open(OUTPUT_FILE, 'a') as f: f.write("*"*PAD_NUM+"\n")
#args[0] will hold the value of the current depth in the dir. #hierarchy; args[1], the file object
args = [0,f]extract_dir_files(args,root_path) f.write("*"*PAD_NUM+"\n") |
The second one gets the HTML contents of a given URL which are parsed to a DOM-like object. After that, all that it's left is to extract the data of interest and do whatever you want. In my case, I decided to write them to a *.CSV file for future use. I used the library available at http://www.boddie.org.uk/python/libxml2dom.html
It was pretty easy to use, and it allowed for DOM-like processing of HTML pages, which aren't generally well-formed. This causes complaints and exceptions when using better known Python DOM libraries.
def main(): #URL returns a list of links, of which I'll only be #interested in their text values. #FROM and TO are the values I want to pass to the query #as GET parameters. with contextlib.closing(urllib.urlopen(URL%(FROM,TO))) as url: encoding = url.headers['Content-type'].split('charset=')[1] contents = url.read().decode(encoding).encode('utf-8') doc = libxml2dom.parseString(contents,html=1) links = doc.getElementsByTagName("a") results = [] for l in links: release = l.childNodes[0].nodeValue #...then do whatever you want with it :P |
No comments:
Post a Comment