October 05, 2006
The joy of os.walk()

One thing that came up yesterday was that people still aren't feeling comfortable with os.walk(). Which is a shame - I love it. I mentioned that I'd used it only that day, in a nice little script that locates malformed XML in a directory tree, and Simon suggested that I post it. So, here it is:

for path, dirs, files in os.walk(os.getcwd()):
    for xml in [os.path.abspath(os.path.join(path, filename)) for filename in files if fnmatch.fnmatch(filename, '*.xml')]:
        try:
            ElementTree.parse(xml)
        except (SyntaxError, ExpatError):
            print xml, "\tBADLY FORMED!"

Syntax highlighted version, with imports: find_dodgy_xml.py.

Notice that it takes more code to import ElementTree than it does to do the actual work! It'll be nice what we can rely on version 2.5 being available, but that's a while away.

Hmm, actually, this is such a common pattern that it's probably worth a helper function:

def locate(pattern, root=os.getcwd()):
    for path, dirs, files in os.walk(root):
        for filename in [os.path.abspath(os.path.join(path, filename)) for filename in files if fnmatch.fnmatch(filename, pattern)]:
            yield filename

Syntax highlighted version: locate.py.

This simplifies the main loop to:

for xml in locate("*.xml"):
    try:
        ElementTree.parse(xml)
    except (SyntaxError, ExpatError):
        print xml, "\tBADLY FORMED!"

Worth a cookbook recipe, or is it too simple?

Posted to Python by Simon Brunning at October 05, 2006 04:20 PM
Comments
Post a comment
Name:


Email Address:


URL:



Comments:


Remember info?