我已经构建了这个脚本,用于解析XML文件并检索id,created-date,author-id和comments节点的内容,并将它们打印到CSV。它主要是 使用一个例外...
我遇到了一个问题,即脚本循环遍历XML中的每个id并打印XML中的每个注释,就好像它不属于那个id一样。
理想情况下,最终目标是仅获取和打印属于每个唯一ID的注释,并打印注释节点的内容。
问题示例(CSV输出):
Ticket ID,Created Date,Author ID,Comment
992,2016-06-21,232245,"Hi, this is what is happening."
992,2016-06-22,231122,"This is another comment from the same id."
996,2016-06-21,232245,"Hi, this is what is happening."
996,2016-06-22,231122."This is another comment from the same id."
我只想打印仅与id相关的评论,而不是每个id的所有评论(如果有意义的话)。
以下是代码:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import sys
from xml.etree import ElementTree as ET
import csv
xml_file = sys.argv[1]
if not xml_file.endswith('.xml'):
print "%s is not a valid XML file. Exiting." % xml_file
exit()
tree = ET.parse(xml_file)
root = tree.getroot()
# Ignore characters/string(s) (if any)
ignore_chars = ['>', '>>']
class RotateFile(object):
def __init__(self, directory='', filename='', max_files=sys.maxint,
max_file_size='', header=''):
self.ii = 1
self.header = header
self.directory, self.filename = directory, filename
self.max_file_size, self.max_files = max_file_size, max_files
self.finished, self.fh = False, None
self.open()
def rotate(self):
"""Rotate the file, if necessary"""
if (os.stat(self.filename_template).st_size > self.max_file_size):
self.close()
self.ii += 1
if (self.ii <= self.max_files):
self.open()
else:
self.close()
self.finished = True
def open(self):
self.fh = open(self.filename_template, 'w')
self.writer = csv.writer(self.fh)
self.writer.writerow(self.header)
def write(self, text=""):
self.writer = csv.writer(self.fh)
self.writer.writerow([s.encode("utf-8") for s in text])
self.fh.flush()
self.rotate()
def close(self):
self.fh.close()
@property
def filename_template(self):
return "%0.2d" % self.ii + "_" + self.filename
def comments():
for comment in root.iter('comment'):
created_at = comment.find("created-at").text
value = comment.find("value").text
author_id = comment.find("author-id").text
if not value:
continue
yield created_at, value, author_id
def tickets(root):
for ticket in root.iter('ticket'):
nice_id = ticket.find("nice-id").text
for comment in comments():
created_at, value, author_id = comment
yield nice_id, created_at, author_id, value
# Set arguments
args = {'directory': '',
'filename': 'output.csv',
'max_file_size': 10485760,
'header': ['Ticket ID', 'Created Date', 'Author ID', 'Comment'],
}
fout = RotateFile(**args)
for row in tickets(root):
if not any(ignore_chars in row for ignore_char in ignore_chars):
print ','.join(row)
fout.write(row)
事先得到很多赞赏。
答案 0 :(得分:0)
以下是修改原始问题所做的代码更改,如上面的评论中所述:
class YourObject {
private Some some;
public boolean matchesSomeOther(YourObject o2) {
return this.getSome().matchesSomeOther(o2.getSome());
}
}
class Some {
private SomeOther someOther;
public boolean matchesSomeOther(Some some2) {
return Objects.isEqual(this.getSomeOther(), some2.getSomeOther());
}
}