Python minidom - 解析XML文件并写入CSV

时间:2015-03-19 21:49:07

标签: python xml csv minidom

我正在尝试解析XML文件,然后将选定的检索对象写入csv文件。

这是我的基本XML文件:

<?xml version="1.0"?>
<library owner="John Q. Reader">
    <book>
        <title>Sandman Volume 1: Preludes and Nocturnes</title>
        <author>Neil Gaiman</author>
    </book>
    <book>
        <title>Good Omens</title>
        <author>Neil Gamain</author>
        <author>Terry Pratchett</author>
    </book>
    <book>
        <title>"Repent, Harlequin!" Said the Tick-Tock Man</title>
        <author>Harlan Ellison</author>
    </book>
    </book>
</library>

我用Python 2.7和minidom编写了一个基本脚本。这是:


# Test Parser

from xml.dom.minidom import parse
import xml.dom.minidom

def printLibrary(myLibrary):
    books = myLibrary.getElementsByTagName("book")
    for book in books:
        print "*****Book*****"
        print "Title: %s" % book.getElementsByTagName("title")[0].childNodes[0].data
        a = for author in book.getElementsByTagName("author"):
            print "Author: %s" % author.childNodes[0].data
            a.csv.writer()
doc = parse('library.xml')
myLibrary = doc.getElementsByTagName("library")[0]

# Get book elements in library
books = myLibrary.getElementsByTagName("book")

# Print each book's title
printLibrary(myLibrary)

到目前为止,这个脚本在Win7中从命令行运行时,会显示每本书的书名和作者。

我想要将这些结果输出到csv文件,所以它看起来像这样:

标题,作者 标题,作者 标题,作者 标题,作者 标题,作者 等

然而,我无法让它发挥作用 - 我对Python很陌生,我从事IT和SQL工作,基础编程就是我所在的地方。

任何帮助都将非常感谢!!

1 个答案:

答案 0 :(得分:0)

使用csv模块。

# Test Parser

from xml.dom.minidom import parse
import csv 


def writeToCSV(myLibrary):
    csvfile = open('output.csv', 'w')
    fieldnames = ['title', 'author']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()

    books = myLibrary.getElementsByTagName("book")
    for book in books:
        titleValue = book.getElementsByTagName("title")[0].childNodes[0].data
        for author in book.getElementsByTagName("author"):
            authorValue = author.childNodes[0].data
            writer.writerow({'title': titleValue, 'author': authorValue})

doc = parse('library.xml')
myLibrary = doc.getElementsByTagName("library")[0]

# Get book elements in library
books = myLibrary.getElementsByTagName("book")

# Print each book's title
writeToCSV(myLibrary)

输出文件:

title,author
Sandman Volume 1: Preludes and Nocturnes,Neil Gaiman
Good Omens,Neil Gamain
Good Omens,Terry Pratchett
"""Repent, Harlequin!"" Said the Tick-Tock Man",Harlan Ellison