最近开始使用Python和ElementTree来实现非常具体的功能。我觉得我几乎就在那里,但有一件事情我可以做得很好。我正在查询xml文件并撤回相关数据 - 然后将该数据放入csv文件中。这一切都有效,但问题是elem.attrib [" text"]实际上返回多行 - 当我将变量放入变量并导出到csv时它只导出第一行 - 下面是我正在使用的代码......
import os
import csv
import xml.etree.cElementTree as ET
path = "/share/new"
c = csv.writer(open("/share/redacted.csv", "wb"))
c.writerow(["S","R","T","R2","R3"])
for filename in os.listdir(path):
if filename.endswith('.xml'):
fullname = os.path.join(path, filename)
tree = ET.ElementTree(file=(fullname))
for elem in tree.iterfind('PropertyList/Property[@name="Sender"]'):
c1 = elem.attrib["value"]
for elem in tree.iterfind('PropertyList/Property[@name="Recipient"]'):
c2 = elem.attrib["value"]
for elem in tree.iterfind('PropertyList/Property[@name="Date"]'):
c3 = elem.attrib["value"]
for elem in tree.iterfind('ChildContext/ResponseList/Response/TextualAnalysis/ExpressionList/Expression/Match'):
c4 = elem.attrib["textView"]
for elem in tree.iterfind('ChildContext/ResponseList/Response/TextualAnalysis/ExpressionList/Expression/Match/Matched'):
c5 = elem.attrib["text"]
print elem.attrib["text"]
print c5
c.writerow([(c1),(c2),(c3),(c4),(c5)])
最重要的部分就在靠近底部 - 印刷elem.atrrib [" text"]的输出是:
Apples
Bananas
打印c5'的输出是相同的(只是要明确苹果和香蕉在单独的线上)
但是,将c5输出到csv只输出第一行,因此只有苹果出现在csv中。
我希望这是有道理的 - 我需要做的是将苹果和香蕉输出到csv(最好在同一个细胞中)。下面是在Python 2.7开发中,但理想情况下我需要它在2.6中工作(我意识到iterfind不在2.6中 - 我已经有2个版本的代码)
我会发布xml,但它有点像野兽。 - 根据评论中的建议,这里是一个清理过的XML。
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<Context>
<PropertyList duplicates="true">
<Property name="Sender" type="string" value="S:demo1@no-one.local"/>
<Property name="Recipient" type="string" value="RPFD:no-one.local"/>
<Property name="Date" type="string" value="Tue, 4 Aug 2015 13:24:16 +0100"/>
</PropertyList>
<ChildContext>
<ResponseList>
<Response>
<Description>
<Arg />
<Arg />
</Description>
<TextualAnalysis version="2.0">
<ExpressionList>
<Expression specified=".CLEAN.(Apples)" total="1" >
<Match textView="Body" truncated="false">
<Surrounding text="..."/>
<Surrounding text="How do you like them "/>
<Matched cleaned="true" text="Apples " type="expression"/>
<Surrounding text="???????? "/>
<Surrounding text="..."/>
</Match>
</Expression>
</ExpressionList>
</TextualAnalysis>
</Response>
</ResponseList>
</ChildContext>
<ChildContext>
<ResponseList>
<Response>
<Description>
<Arg />
<Arg />
</Description>
<TextualAnalysis version="2.0">
<ExpressionList>
<Expression specified=".CLEAN.(Bananas)" total="1" >
<Match textView="Attach" truncated="false">
<Surrounding text="..."/>
<Surrounding text="Also I don't like... "/>
<Matched cleaned="true" text="Bananas " type="expression"/>
<Surrounding text="!!!!!!! "/>
<Surrounding text="..."/>
</Match>
</Expression>
</ExpressionList>
</TextualAnalysis>
</Response>
</ResponseList>
</ChildContext>
</Context>
答案 0 :(得分:0)
以下内容将所有文本元素连接在一起,并将它们放在CSV中相同单元格中的单独行上。您可以将'\ n'分隔符更改为''或','将它们放在同一行。 然而,你可能仍然遇到一些其他东西的问题 - 你没有嵌套循环,我真的不明白你想要完成什么,所以也许你有更多其中一个也是其中一个。无论如何:
c5 = []
for elem in tree.iterfind('ChildContext/ResponseList/Response/TextualAnalysis/ExpressionList/Expression/Match/Matched'):
c5.append(elem.attrib["text"])
c.writerow([c1, c2, c3, c4, '\n'.join(c5)])