Small Values of Cool: A quick look at ElementTree

February 04, 2005

A quick look at ElementTree

Lumigent Log Explorer is a potential life saver, as I've said before, but there are some oddities with it. All the data is there, but we've had trouble getting exactly what we want from it in exactly the way that we want it.

No matter - it has a facility to export all your raw transactions to XML. Sorted! (Well, we had to fix a couple of very minor issues before the XML was well formed - & characters were not escaped to &, and we needed to add add encoding declaration. Still, not far off.)

I've not processed much XML, and not done any at all for a while. I was never really comfortable with the Python XML libraries that I'd played with, so I thought I'd give the effbot's ElementTree module a try.

The API is lovely. After no more than five or ten minutes, I felt like I knew what I was doing.

An example. A radically truncated version of the XML output from Log Explorer might look like this - oh-bugger-its-all-gone-a-bit-pete-tong-lets-hope-i-can-recover-the-data-from-this.xml. (The real thing was over seventy MB in size.) Code to loop through all the records, pull out the relevant details (including all the row's data from a sub-element) is as simple as this:

import cElementTree as ElementTree

# Parse XML... tree = ElementTree.parse("oh-bugger-its-all-gone-a-bit-pete-tong-lets-hope-i-can-recover-the-data-from-this.xml") root = tree.getroot()

for record in root: # Pull out tags timestamp = record.findtext('DATETIME') opcode = record.findtext('OPCODETXT') table = record.findtext('TABLENAME') rowdata = dict((column.tag, column.text or '') for column in (record.find('ROWDATA') or []))

# Complex stuff here... print timestamp, opcode, table, rowdata

Nice, eh?

Clearly my code did something a bit more complex that just printing out the data, but you get the idea. In fact, I'm rather pleased with the script on the whole. It does an awful lot with not much code - Python's dictionaries, lists and string interpolation do most of the work.

Performance? Now, I'm rather wary of venturing into benchmarking territory, so I'll just say that cElementTree goes like stink, and leave it at that.

Frankly, I'm almost always totally uninterested by benchmarks in any case. Software only has two speeds - fast enough, and not fast enough. cElementTree is comfortably in the fast enough range. Beyond that, I honestly couldn't care less.

Posted to Python by Simon Brunning at February 04, 2005 12:14 PM

Comments