您好我是Python的新手,
我想将.csv
文件转换为XML
。期望的输出应该是这样的,我希望在节点中有每个单独的ID:<employee id="5">
,并且每个人对应的变量在彼此之下,而不是在同一行:
<?xml version="1.0" encoding="UTF-8"?>
<Document>
<employee id="1">
<Name>Steve</Name>
<City>Boston</City>
<Age>33</Age>
</employee>
<employee id="2">
<Name>Michael</Name>
<City>Dallas</City>
<Age>45</Age>
</employee>
<employee id="3">
<Name>John</Name>
<City>New York</City>
<Age>89</Age>
</employee>
<employee id="4">
<Name>Thomas</Name>
<City>LA</City>
<Age>62</Age>
</employee>
<employee id="5">
<Name>Clint</Name>
<City>Paris</City>
<Age>30</Age>
</employee>
</Document>
给出一些数据:
import pandas
ID = pandas.DataFrame([1,2,3,4,5])
name = pandas.DataFrame(["Steve","Michael","John","Thomas","Clint"])
city = pandas.DataFrame(["Boston","Dallas","New York","LA","Paris"])
Age = pandas.DataFrame([45,33,33,20,50])
df = pandas.concat([ID, name,city,Age], axis=1)
df.columns = ['ID','name','city','Age']
df
ID name city Age
0 1 Steve Boston 45
1 2 Michael Dallas 33
2 3 John New York 33
3 4 Thomas LA 20
4 5 Clint Paris 50
从.csv
转换为XML
:
import csv
csvFile = 'df.csv'
xmlFile = 'myData.xml'
csvData = csv.reader(open(csvFile))
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0"?>' + "\n")
# there must be only one top-level tag
xmlData.write('<Document>' + "\n")
rowNum = 0
for employee in csvData:
if rowNum == 0:
tags = employee
# replace spaces w/ underscores in tag names
for i in range(len(tags)):
tags[i] = tags[i].replace(' ', '_')
else:
xmlData.write('<employee >' + "\n")
for i in range(len(tags)):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ employee [i] + '</' + tags[i] + '>' + "\n")
xmlData.write('</employee >' + "\n")
rowNum +=1
xmlData.write('</Document>' + "\n")
xmlData.close()
输出XML
,看起来有点偏离:
<<?xml version="1.0"?>
<Document>
<employee>
<X>1</X>
<ID>1</ID>
<Name>Steve</Name>
<City>Boston</City>
<Age>33</Age>
</employee>
<employee>
<X>2</X>
<ID>2</ID>
<Name>Michael</Name>
<City>Dallas</City>
<Age>45</Age>
</employee>
<employee>
<X>3</X>
<ID>3</ID>
<Name>John</Name>
<City>New York</City>
<Age>89</Age>
</employee>
<employee>
<X>4</X>
<ID>4</ID>
<Name>Thomas</Name>
<City>LA</City>
<Age>62</Age>
</employee>
<employee>
<X>5</X>
<ID>5</ID>
<Name>Clint</Name>
<City>Paris</City>
<Age>30</Age>
</employee>
</Document>
答案 0 :(得分:2)
创建csv reader对象时,需要指定csv文件的分隔符(默认为',')。
csvData = csv.reader(open(csvFile), delimiter=' ')
如果没有给出,那么标签的条目不符合您想要的格式。
for循环中的else部分不正确。 这应该是解决方案:
import csv
csvFile = 'df.csv'
xmlFile = 'myData.xml'
csvData = csv.reader(open(csvFile), delimiter=';')
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0"?>' + "\n")
# there must be only one top-level tag
xmlData.write('<Document>' + "\n")
rowNum = 0
for employee in csvData:
if rowNum == 0:
tags = employee
# replace spaces w/ underscores in tag names
for i in range(len(tags)):
tags[i] = tags[i].replace(' ', '_')
else:
xmlData.write('<employee ' + tags[0] + '="' + employee[0] + '" >' + "\n")
for i in range(1,len(tags)):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ employee [i] + '</' + tags[i] + '>' + "\n")
xmlData.write('</employee >' + "\n")
rowNum +=1
xmlData.write('</Document>' + "\n")
xmlData.close()
答案 1 :(得分:2)
使用XML解析器会容易得多。以下是使用xml.etree.ElementTree模块的示例。我假设您使用df.to_csv('df.csv')
import csv
import xml.etree.ElementTree as ET
csvFile = 'df.csv'
csvData = csv.reader(open(csvFile))
root = ET.Element('Document')
next(csvData) # skip header
for _, employee_id, name, city, age in csvData:
employee_elem = ET.SubElement(root, "Employee")
employee_elem.set('id', employee_id) # set attribute
# Child elements
name_elem = ET.SubElement(employee_elem, "Name")
name_elem.text = name
city_elem = ET.SubElement(employee_elem, "City")
city_elem.text = city
age_elem = ET.SubElement(employee_elem, "Name")
age_elem.text = age
ET.ElementTree(root).write('df.xml', encoding='utf-8', xml_declaration=True)