将.csv转换为XML时如何将ID插入节点?

时间:2016-09-01 14:48:05

标签: python

您好我是Python的新手,

我想将.csv文件转换为XML。期望的输出应该是这样的,我希望在节点中有每个单独的ID:<employee id="5">,并且每个人对应的变量在彼此之下,而不是在同一行:

<?xml version="1.0" encoding="UTF-8"?>
<Document>
  <employee id="1">
    <Name>Steve</Name>
    <City>Boston</City>
    <Age>33</Age>
  </employee>
  <employee id="2">
    <Name>Michael</Name>
    <City>Dallas</City>
    <Age>45</Age>
  </employee>
  <employee id="3">
    <Name>John</Name>
    <City>New York</City>
    <Age>89</Age>
  </employee>
  <employee id="4">
    <Name>Thomas</Name>
    <City>LA</City>
    <Age>62</Age>
  </employee>
  <employee id="5">
    <Name>Clint</Name>
    <City>Paris</City>
    <Age>30</Age>
  </employee>
</Document>

给出一些数据:

import pandas
ID = pandas.DataFrame([1,2,3,4,5])
name = pandas.DataFrame(["Steve","Michael","John","Thomas","Clint"])
city = pandas.DataFrame(["Boston","Dallas","New York","LA","Paris"])
Age = pandas.DataFrame([45,33,33,20,50])
df = pandas.concat([ID, name,city,Age], axis=1)
df.columns = ['ID','name','city','Age']

df

    ID  name    city    Age

0   1   Steve   Boston  45
1   2   Michael Dallas  33
2   3   John    New York    33
3   4   Thomas  LA  20
4   5   Clint   Paris   50

.csv转换为XML

import csv

csvFile = 'df.csv'
xmlFile = 'myData.xml'

csvData = csv.reader(open(csvFile))
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0"?>' + "\n")
# there must be only one top-level tag
xmlData.write('<Document>' + "\n")

rowNum = 0
for employee in csvData:
    if rowNum == 0:
       tags = employee 
       # replace spaces w/ underscores in tag names
       for i in range(len(tags)):
           tags[i] = tags[i].replace(' ', '_')
    else: 
       xmlData.write('<employee >' + "\n")
       for i in range(len(tags)):
           xmlData.write('    ' + '<' + tags[i] + '>' \
                        + employee [i] + '</' + tags[i] + '>' + "\n")
       xmlData.write('</employee >' + "\n")

    rowNum +=1

xmlData.write('</Document>' + "\n")
xmlData.close()

输出XML,看起来有点偏离:

<<?xml version="1.0"?>
<Document>
<employee>
    <X>1</X>
    <ID>1</ID>
    <Name>Steve</Name>
    <City>Boston</City>
    <Age>33</Age>
</employee>
<employee>
    <X>2</X>
    <ID>2</ID>
    <Name>Michael</Name>
    <City>Dallas</City>
    <Age>45</Age>
</employee>
<employee>
    <X>3</X>
    <ID>3</ID>
    <Name>John</Name>
   <City>New York</City>
    <Age>89</Age>
</employee>
<employee>
    <X>4</X>
    <ID>4</ID>
    <Name>Thomas</Name>
    <City>LA</City>
    <Age>62</Age>
</employee>
 <employee>
    <X>5</X>
    <ID>5</ID>
    <Name>Clint</Name>
    <City>Paris</City>
    <Age>30</Age>
</employee>
</Document>

2 个答案:

答案 0 :(得分:2)

创建csv reader对象时,需要指定csv文件的分隔符(默认为',')。

csvData = csv.reader(open(csvFile), delimiter=' ')

如果没有给出,那么标签的条目不符合您想要的格式。

for循环中的else部分不正确。 这应该是解决方案:

import csv

csvFile = 'df.csv'
xmlFile = 'myData.xml'

csvData = csv.reader(open(csvFile), delimiter=';')
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0"?>' + "\n")
# there must be only one top-level tag
xmlData.write('<Document>' + "\n")

rowNum = 0
for employee in csvData:
    if rowNum == 0:
       tags = employee 
       # replace spaces w/ underscores in tag names
       for i in range(len(tags)):
           tags[i] = tags[i].replace(' ', '_')
    else: 
       xmlData.write('<employee ' + tags[0] + '="' + employee[0] + '" >' + "\n")
       for i in range(1,len(tags)):
           xmlData.write('    ' + '<' + tags[i] + '>' \
                        + employee [i] + '</' + tags[i] + '>' + "\n")
       xmlData.write('</employee >' + "\n")

    rowNum +=1

xmlData.write('</Document>' + "\n")
xmlData.close()

答案 1 :(得分:2)

使用XML解析器会容易得多。以下是使用xml.etree.ElementTree模块的示例。我假设您使用df.to_csv('df.csv')

将数据框转换为csv
import csv
import xml.etree.ElementTree as ET

csvFile = 'df.csv'
csvData = csv.reader(open(csvFile))

root = ET.Element('Document')
next(csvData)  # skip header
for _, employee_id, name, city, age in csvData:
    employee_elem = ET.SubElement(root, "Employee")
    employee_elem.set('id', employee_id)  # set attribute

    # Child elements
    name_elem = ET.SubElement(employee_elem, "Name")
    name_elem.text = name
    city_elem = ET.SubElement(employee_elem, "City")
    city_elem.text = city
    age_elem = ET.SubElement(employee_elem, "Name")
    age_elem.text = age

ET.ElementTree(root).write('df.xml', encoding='utf-8', xml_declaration=True)