October 05, 2006
The joy of os.walk()

One thing that came up yesterday was that people still aren't feeling comfortable with os.walk(). Which is a shame - I love it. I mentioned that I'd used it only that day, in a nice little script that locates malformed XML in a directory tree, and Simon suggested that I post it. So, here it is:

for path, dirs, files in os.walk(os.getcwd()):
    for xml in [os.path.abspath(os.path.join(path, filename)) for filename in files if fnmatch.fnmatch(filename, '*.xml')]:
        try:
            ElementTree.parse(xml)
        except (SyntaxError, ExpatError):
            print xml, "\tBADLY FORMED!"

Syntax highlighted version, with imports: find_dodgy_xml.py.

Notice that it takes more code to import ElementTree than it does to do the actual work! It'll be nice what we can rely on version 2.5 being available, but that's a while away.

Hmm, actually, this is such a common pattern that it's probably worth a helper function:

def locate(pattern, root=os.getcwd()):
    for path, dirs, files in os.walk(root):
        for filename in [os.path.abspath(os.path.join(path, filename)) for filename in files if fnmatch.fnmatch(filename, pattern)]:
            yield filename

Syntax highlighted version: locate.py.

This simplifies the main loop to:

for xml in locate("*.xml"):
    try:
        ElementTree.parse(xml)
    except (SyntaxError, ExpatError):
        print xml, "\tBADLY FORMED!"

Worth a cookbook recipe, or is it too simple?

Posted to Python by Simon Brunning at October 05, 2006 04:20 PM
Comments

Definitely worth a cookbook entry.

Posted by: Paul Moore on October 5, 2006 04:49 PM

Not too simple, IMO! But personally, I think Jason Orendorff's path module is the nicest way to do this sort of thing. Using path, your locate() call would look like:

from path import path
for xml in path('.').walkfiles('*.xml'): ...

Plus much more. It should be in the standard library, if you ask me. :-)

http://www.jorendorff.com/articles/python/path/

Graham

Posted by: Graham Fawcett on October 5, 2006 04:55 PM

Perhaps, but path *won't* be in the standard library, at least not in its current form. Guido has spoken. The path module does several different things - it deals with paths as strings, as well as actual files and directories. It should be broken up. Also, using "/" as a join operator isn't very popular.

Posted by: Simon on October 5, 2006 04:59 PM

I think you'd be better off using os.path.curdir instead of os.getcwd(), especially for the default argument value.

Also, I think that if you do

root = os.path.abspath(root)

once before the loop, you won't have to all abspath on every file you find.

And thirdly, I'm pretty sure you want to yield os.path.join(path, filename) rather than just raw filename.

Posted by: Marius Gedminas on October 5, 2006 08:27 PM

> os.path.curdir instead of os.getcwd()

You mean os.curdir? Good point. (This will mean that if the current directory changes between when the function is defined and when it is run, the newer current directory will be used.)

> root = os.path.abspath(root)

Good idea.

> os.path.join(path, filename) rather than just raw filename

This was being done in the list comp. It's probably a better idea to move it out to the yield for readability. Also, I can use a generator expression rather than a list comp.

An updated version looks like:

def locate(pattern, root=os.curdir):
    for path, dirs, files in os.walk(os.path.abspath(root)):
        for filename in (filename for filename in files if fnmatch.fnmatch(filename, pattern)):
            yield os.path.join(path, filename)

Thanks, Marius.

Posted by: Simon on October 6, 2006 01:38 PM

def locate(pattern, root=os.curdir):
for path, dirs, files in os.walk(os.path.abspath(root)):
for filename in fnmatch.filter(files, pattern):
yield os.path.join(path, filename)

is yet more succinct.

Posted by: Tom Lynn on October 12, 2006 11:12 AM

Oooh, nice, thanks Tom.

Posted by: Simon on October 12, 2006 11:32 AM

Partisini anlatt Cumhurbaskan iyi Biden solda
http://maskufagis.co.tk - read more in news

Posted by: Beettylog on May 29, 2009 07:28 AM

Online Methotrexate ankylosing spondylitis
Online Methotrexate 17.5
Best Methotrexate 17.5

Posted by: grootrx9 on May 30, 2009 09:22 AM

No one gossips about other people's secret virtues.
http://oposiqquru.iespana.es/portable-storage-device-digital-camera.html - portable storage device digital camera http://oposiqquru.iespana.es/sony-digital-camera-lens.html - sony digital camera lens http://oposiqquru.iespana.es/digital-camera-repairs-manchester.html - digital camera repairs manchester http://oposiqquru.iespana.es/c7000-digital-camera.html - c7000 digital camera http://wosopepu.iespana.es/sony-dsc-w7-digital-camera.html - sony dsc w7 digital camera http://oposiqquru.iespana.es/review-digital-camera-slr.html - review digital camera slr http://oposiqquru.iespana.es/canon-powershot-sd300-digital-camera-review.html - canon powershot sd300 digital camera review

Posted by: Stakan on June 25, 2009 12:48 PM

CxfwaX comment3 ,

Posted by: Lwadgtxl on June 27, 2009 12:46 AM

Hello. And Bye.

Posted by: TVAYUMAT on September 8, 2009 09:12 AM

guenstige handy datentarife, guenstiges smartphone, t mobile g1 kaufen.
http://conspaj.is.com/nokia-5000-verkauf/handy-mit-kamera-guenstig.html - guenstige callya handys http://conspaj.is.com/nokia-1208-kaufen/nokia-e71-ohne-vertrag-guenstig.html - guenstiger handyvertrag mit zugabe http://conspaj.is.com/nokia-5000-verkauf/guenstiges-smartphone.html - handy defekt kaufen

Posted by: GSR on September 3, 2010 02:28 PM
Post a comment
Name:


Email Address:


URL:



Comments:


Remember info?