stackoverflow用户通过帮助我将XML文件解析为csv文件帮助我节省了大量时间。非常有帮助,我决定尝试遍历整个目录并将所有xml文件解析为单个csv文件。以下是有效的代码:
import xml.etree.ElementTree as ET
import csv
extra_columns = 2
fields = [
('Id_Customer', 'Id_Customer', 1),
('Segment', 'Segment', 1),
('Nature', 'Event/Nature', 1),
('Extrainfo', 'Event/Extrainfo', 1),
('zipcode', 'Adress/zipcode', extra_columns),
('street', 'Adress/street', extra_columns),
('number', 'Adress/number', extra_columns)]
tree = ET.parse('cat.xml')
root = tree.getroot()
# Auto create the header from fields
fieldnames = []
for field, match, cols in fields:
fieldnames.append(field)
if cols > 1:
fieldnames.extend(["{}{}".format(field, x+2) for x in range(extra_columns)])
with open(r'customerdata.csv', 'wb') as f_customerdata:
csv_customerdata = csv.DictWriter(f_customerdata, fieldnames=fieldnames)
csv_customerdata.writeheader()
for node in tree.iter('Customer'):
row = {}
for field_name, match, cols in fields:
if cols > 1:
for index, el in enumerate(node.findall(match)):
try:
if index:
row["{}{}".format(field_name, index+1)] = el.text
else:
row[field_name] = el.text
except AttributeError as e:
row[field_name] = ''
else:
try:
row[field_name] = node.find(match).text
except AttributeError as e:
row[field_name] = ''
csv_customerdata.writerow(row)
然后我尝试引入listdir来查找所有文件名(这一步有效):
for filename in os.path.join(r'C:\docs', filename):
if not filename.endswith('.xml'): continue
fullname = os.path.join(r'C:\docs', filename)
然而,当我尝试整合这些步骤时,我发现我只是从目录中的第一个xml文件中获取数据。请参阅下面的合并代码。我试图找出为什么我的for循环没有迭代并将每个解析的xml文件写入csv文件。
import xml.etree.ElementTree as ET
import csv
import os
extra_columns = 2
fields = [
('Id_Customer', 'Id_Customer', 1),
('Segment', 'Segment', 1),
('Nature', 'Event/Nature', 1),
('Extrainfo', 'Event/Extrainfo', 1),
('zipcode', 'Adress/zipcode', extra_columns),
('street', 'Adress/street', extra_columns),
('number', 'Adress/number', extra_columns)]
#tree = ET.parse('cat.xml')
#root = tree.getroot()
# Auto create the header from fields
fieldnames = []
with open(r'customerdata.csv', 'wb') as f_customerdata:
csv_customerdata = csv.DictWriter(f_customerdata, fieldnames=fieldnames)
csv_customerdata.writeheader()
for filename in os.listdir(r'C:\docs'):
if not filename.endswith('.xml'): continue
fullname = os.path.join(r'C:\docs',filename)
tree = ET.parse(fullname)
root = tree.getroot()
for node in tree.iter('Customer'):
row = {}
for field_name, match, cols in fields:
if cols > 1:
for index, el in enumerate(node.findall(match)):
try:
if index:
row["{}{}".format(field_name, index+1)] = el.text
else:
row[field_name] = el.text
except AttributeError as e:
row[field_name] = ''
else:
try:
row[field_name] = node.find(match).text
except AttributeError as e:
row[field_name] = ''
csv_customerdata.writerow(row)
更新:
with open(r'customerdata.csv', 'wb') as f_customerdata:
csv_customerdata = csv.DictWriter(f_customerdata, fieldnames=fieldnames)
csv_customerdata.writeheader()
for filename in os.listdir(r'C:\docs'):
if not filename.endswith('.xml'): continue
fullname = os.path.join(r'C:\docs',filename)
tree = ET.parse(fullname)
root = tree.getroot()
for node in tree.iter('Customer'):
row = {}
for field_name, match, cols in fields:
if cols > 1:
for index, el in enumerate(node.findall(match)):
try:
if index:
row["{}{}".format(field_name, index+1)] = el.text
else:
row[field_name] = el.text
except AttributeError as e:
row[field_name] = ''
else:
try:
row[field_name] = node.find(match).text
except AttributeError as e:
row[field_name] = ''
csv_customerdata.writerow(row)
答案 0 :(得分:0)
您正在写入文件循环之外的CSV。以下行应该在遍历文件的循环中:
u_choice = [a, b, c, d]
def color_change_one(x, y):
global u_choice
if t_one.color() == ('grey', 'grey'):
t_one.color('red')
u_choice[0] = 'red'
elif t_one.color() == ('red', 'red'):
t_one.color('blue')
u_choice[0] = 'blue'
t_one = turtle.Turtle()
t_one.shape('circle')
t_one.color('grey')
t_one.onclick(color_change_one)
答案 1 :(得分:0)
不要考虑单个for
循环(除了循环文件),if
逻辑或辅助字典/列表对象。只需运行XSLT,这种转换语言可以将XML转换为其他XML,HTML,尤其是CSV / TXT文件。
但是,要运行XSLT 1.0脚本,您需要Python的第三方模块lxml
,而不是内置的etree
。有external processors Python可以调用甚至是内置的,例如Window的System.Xml.Xsl和Linux / Mac的xsltproc。
XSLT (另存为.xsl,另一个特殊格式良好的.xml文件)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/CAT">
<xsl:text>Id_Customer,Segment,Nature,Extrainfo,zipcode,zipcode2,zipcode3,street,street2,street3,number,number2,number3
</xsl:text>
<xsl:apply-templates select="Customer"/>
</xsl:template>
<xsl:template match="Customer">
<xsl:value-of select="concat(Id_Customer, ', ', Segment, ', ', Event/Nature, ', ', Event/Extrainfo, ', ',
Address[1]/zipcode, ', ', Address[2]/zipcode, ', ', Address[3]/zipcode, ', ',
Address[1]/street, ', ', Address[2]/street, ', ', Address[3]/street, ', ',
Address[1]/number, ', ', Address[2]/number, ', ', Address[3]/number)"/><xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
Python (通过此过程集成您的XML)
import lxml.etree as et
# LOAD XML AND XSL
doc = et.parse('Input.xml')
xsl = et.parse('XSLTScript.xsl')
# TRANSFORM XSLT
transform = et.XSLT(xsl)
result = transform(doc)
# SAVE RESULT TO FILE
with open('Output.csv', 'w') as f:
f.write(str(result))
输入示例(从上一个问题帖子扩展而来)
<?xml version="1.0"?>
<CAT>
<Header>...</Header>
<Add>...</Add>
<Customer>
<Id_Customer>xyz1</Id_Customer>
<Segment>abc1</Segment>
<Event>
<Nature>info1</Nature>
<Extrainfo>info2</Extrainfo>
</Event>
</Customer>
<Customer>
<Id_Customer>zzwy</Id_Customer>
<Segment>c2</Segment>
<Address>
<zipcode>99999</zipcode>
<street>belaire drive</street>
<number>5</number>
</Address>
<Address>
<zipcode>88888</zipcode>
<street>pennsylvania ave</street>
<number>10</number>
</Address>
<Address>
<zipcode>77777</zipcode>
<street>main street</street>
<number>15</number>
</Address>
</Customer>
</CAT>
<强>输出强>
Id_Customer,Segment,Nature,Extrainfo,zipcode,zipcode2,zipcode3,street,street2,street3,number,number2,number3
xyz1, abc1, info1, info2, , , , , , , , ,
zzwy, c2, , , 99999, 88888, 77777, belaire drive, pennsylvania ave, main street, 5, 10, 15