This page has snippets of Python code for accomplishing various data processing tasks. Below each snippet is an IPython %loadpy magic function which can be used like this:

In [1]: %loadpy

The %loadpy IPython magic function returns the content of a script to your terminal (without executing the script). So, for example, %loadpy is handy when you want to make a quick edit to a remote or local python script before executing it. If you don't need to make any changes, then just press enter after executing the %loadpy function.

Code Snippets

»  Return only the unique elements of a list while preserving the order of the original list.

def uniquify(myList, idfun=None):
    if idfun is None:
        def idfun(x): return x
    seen, result = {}, []
    for item in myList:
        marker = idfun(item)
        if marker in seen: continue
        seen[marker] = 1
    return result

»  Remove all the HTML tags from a string.

def striptags(raw_html):
    tag = [False]
    def checkit(i):
        if tag[0]:
            tag[0] = (i != '>')
            return False
        elif i == '<':
            tag[0] = True
            return False
        return True
    return ''.join(i for i in raw_html if checkit(i))

»  Create a list of file names in a directory dirname. Set subdir = True (False) to include (exclude) files in sub-directories of dirname.

from os import listdir,path

def filelist(dirname, subdir, *args):
    f = []
    for i in listdir(dirname):
        d = path.join(dirname, i)
        if path.isfile(d):
            if len(args) == 0: f.append(d)
            elif path.splitext(d)[1][1:] in args: f.append(d)
        elif path.isdir(d) and subdir: f += filelist(d, subdir, *args)
    return f

»  Merge the files in a list of files myFileList into a single file outputFile.

def mergefiles(myFileList, outputFile):
    g = open(outputFile, 'w')
    for i in myFileList:
        print 'Writing file: %s' % i
    print 'File created: %s' % outputFile

Often times it's useful to combine the previous 2 functions into a single file-merging procedure. To do so, first cd into the directory your files are in (dirname), then run:

In [2]: %loadpy

In [3]: %loadpy

In [4]: mergefiles(filelist('.', False, 'txt'), 'myOutputFile.txt')

This command will create the file myOutputFile.txt which contains the content/lines of all .txt files in the working directory.