我想读取CSV文件,并将xml文件中的标记替换为CSV文件的第二列。标签'名称'值位于第一列。
A | B
Value1 | ValueX
Value2 | ValueX
Value3 | ValueY
XML结构看起来像。
<products>
<product>
<name>Value1</name>
</product>
<product>
<name>Values2</name>
</product>
<product>
<name>Values3</name>
</product>
</products>
Python代码
import csv
import collections
import xml.etree.ElementTree
tree = xml.etree.ElementTree.parse("jolly.xml").getroot()
with open('file.csv', 'r') as f:
reader = csv.DictReader(f)# read rows into a dictionary format
reader = csv.reader(f, dialect=csv.excel_tab)
list = list(reader)
columns = collections.defaultdict(list)# each value in each column is appended to a list
for (k, v) in row.items(): #go over each column name and value
columns[k].append(v)# append the value into the appropriate list
print columns['A']
print columns['B']
for elem in tree.findall('.//name'):
if elem.attrib['name'] == columns['A']:
elem.attrib['name'] == columns['B']
我该如何处理?
以下是CSV文件的外观:
输出应如下所示:
Value1 should be replaced with ValueX
好的,这是我的解决方案:
import lxml.etree as ET
arr = ["Value1", "Value2", "Value3"]
arr2 = ["ValuX", "ValuX", "ValueY"]
with open('file.xml', 'rb+') as f:
tree = ET.parse(f)
root = tree.getroot()
for i, item in enumerate(arr):
for elem in root.findall('.//Value1'):
print(elem);
if elem.tag:
print(item)
print(arr2[i])
elem.text = elem.text.replace(item, arr2[i])
f.seek(0)
f.write(ET.tostring(tree, encoding='UTF-8', xml_declaration=True))
f.truncate()
我正在使用数组。我可以将值从文件复制到数组中。对于大文件,它需要更好的代码。
答案 0 :(得分:0)
考虑使用XSLT,这是一种用于重构XML文件的特殊用途声明性语言。像大多数其他通用语言一样,包括ASP,C#,Java,PHP,Perl,VB,Python维护着一个XSLT 1.0处理器,特别是在其lxml
模块中。
为了您的目的,您可以动态创建可用于转换的XSLT字符串。只需要循环遍历csv数据:
import csv
import lxml.etree as ET
# READ IN CSV DATA AND APPEND TO LIST
csvdata = []
with open('file.csv'), 'r') as csvfile:
readCSV = csv.reader(csvfile)
for line in readCSV:
csvdata.append(line)
# DYNAMICALLY CREATE XSLT STRING
xsltstr = '''<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<!-- Identity Transform -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
'''
for i in range(len(csvdata)):
xsltstr = xsltstr + \
'''<xsl:template match="name[.='{0}']">
<xsl:element name="{1}">
<xsl:apply-templates />
</xsl:element>
</xsl:template>
'''.format(*csvdata[i])
xsltstr = xsltstr + '</xsl:transform>'
# PARSE ORIGINAL FILE AND XSLT STRING
dom = ET.parse('jolly.xml')
xslt = ET.fromstring(xsltstr)
# TRANSFORM XML
transform = ET.XSLT(xslt)
newdom = transform(dom)
# OUTPUT FINAL XML (PRETTY PRINT)
tree_out = ET.tostring(newdom, encoding='UTF-8', pretty_print=True, xml_declaration=True)
xmlfile = open('final.xml'),'wb')
xmlfile.write(tree_out)
xmlfile.close()
<强>输出强>
<?xml version='1.0' encoding='UTF-8'?>
<products>
<product>
<ValueX>Value1</ValueX>
</product>
<product>
<ValueY>Value2</ValueY>
</product>
<product>
<ValueZ>Value3</ValueZ>
</product>
</products>