正确解析XML文件以进行CSV输出

时间:2017-04-27 13:33:13

标签: python python-2.7

我已经构建了这个脚本,用于解析XML文件并检索id,created-date,author-id和comments节点的内容,并将它们打印到CSV。它主要是 使用一个例外...

我遇到了一个问题,即脚本循环遍历XML中的每个id并打印XML中的每个注释,就好像它不属于那个id一样。

理想情况下,最终目标是仅获取和打印属于每个唯一ID的注释,并打印注释节点的内容。

问题示例(CSV输出):

Ticket ID,Created Date,Author ID,Comment
992,2016-06-21,232245,"Hi, this is what is happening."
992,2016-06-22,231122,"This is another comment from the same id."
996,2016-06-21,232245,"Hi, this is what is happening."
996,2016-06-22,231122."This is another comment from the same id."

我只想打印仅与id相关的评论,而不是每个id的所有评论(如果有意义的话)。

以下是代码:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os
import sys

from xml.etree import ElementTree as ET
import csv

xml_file = sys.argv[1]

if not xml_file.endswith('.xml'):
    print "%s is not a valid XML file. Exiting." % xml_file
    exit()

tree = ET.parse(xml_file)
root = tree.getroot()

# Ignore characters/string(s) (if any)
ignore_chars = ['>', '>>']


class RotateFile(object):
    def __init__(self, directory='', filename='', max_files=sys.maxint,
                 max_file_size='', header=''):
        self.ii = 1
        self.header = header
        self.directory, self.filename = directory, filename
        self.max_file_size, self.max_files = max_file_size, max_files
        self.finished, self.fh = False, None
        self.open()

    def rotate(self):
        """Rotate the file, if necessary"""
        if (os.stat(self.filename_template).st_size > self.max_file_size):
            self.close()
            self.ii += 1
            if (self.ii <= self.max_files):
                self.open()
            else:
                self.close()
                self.finished = True

    def open(self):
        self.fh = open(self.filename_template, 'w')
        self.writer = csv.writer(self.fh)
        self.writer.writerow(self.header)

    def write(self, text=""):
        self.writer = csv.writer(self.fh)
        self.writer.writerow([s.encode("utf-8") for s in text])
        self.fh.flush()
        self.rotate()

    def close(self):
        self.fh.close()

    @property
    def filename_template(self):
        return "%0.2d" % self.ii + "_" + self.filename


def comments():
    for comment in root.iter('comment'):
        created_at = comment.find("created-at").text
        value = comment.find("value").text
        author_id = comment.find("author-id").text
        if not value:
            continue
        yield created_at, value, author_id


def tickets(root):
    for ticket in root.iter('ticket'):
        nice_id = ticket.find("nice-id").text
        for comment in comments():
            created_at, value, author_id = comment
            yield nice_id, created_at, author_id, value


# Set arguments
args = {'directory': '',
        'filename': 'output.csv',
        'max_file_size': 10485760,
        'header': ['Ticket ID', 'Created Date', 'Author ID', 'Comment'],
        }

fout = RotateFile(**args)

for row in tickets(root):
    if not any(ignore_chars in row for ignore_char in ignore_chars):
        print ','.join(row)
        fout.write(row)

事先得到很多赞赏。

1 个答案:

答案 0 :(得分:0)

以下是修改原始问题所做的代码更改,如上面的评论中所述:

class YourObject {

    private Some some;

    public boolean matchesSomeOther(YourObject o2) {
        return this.getSome().matchesSomeOther(o2.getSome());
    }
}

class Some {

    private SomeOther someOther;

    public boolean matchesSomeOther(Some some2) {
        return Objects.isEqual(this.getSomeOther(), some2.getSomeOther());
    }
}