从xml文件

时间:2017-10-10 11:36:09

标签: python xml python-3.x dataframe

我有一个真正的(也许是非常愚蠢的)问题,将xml文件从pandas转换为数据帧。我是python的新手,需要一些帮助。我尝试从另一个线程的代码并修改它但它不起作用。

我想迭代这个文件:

<objects>
 <object id="123" name="some_string">
<object>
  <id>123</id>
  <site id="456" name="somename" query="some_query_as_string"/>
  <create-date>some_date</create-date>
  <update-date>some_date</update-date>
  <update-user id="567" name="User:xyz" query="some_query_as_string"/>
  <delete-date/>
  <delete-user/>
  <deleted>false</deleted>
  <system-object>false</system-object>
  <to-string>some_string_notifications</to-string>
</object>
<workflow>
  <workflow-type id="12345" name="WorkflowType_some_workflow" query="some_query_as_string"/>
  <validated>true</validated>
  <name>somestring</name>
  <exported>false</exported>
</workflow>

这是我的代码:

import xml.etree.ElementTree as ET
import pandas as pd

path = "C:/Users/User/Desktop/test.xml"
with open(path, 'rb') as fp:
    content = fp.read()
parser = ET.XMLParser(encoding="utf-8")
tree = ET.fromstring(content, parser=parser)

def xml2df(tree):
root = ET.XML(tree)
all_records = []
for i, child in enumerate(root):
    record ={}
    for subchild in child:
        record[subchild.tag] = subchild.text
        all_records.append(record)
    return pd.DataFrame(all_records) 

问题出在哪里?请帮忙:O

1 个答案:

答案 0 :(得分:1)

您正在将文件位置字符串传递给ET.fromstring(),这不是文件的实际内容。您需要先读取文件的内容,然后将其传递给ET.fromstring()

path = "C:/Users/User/Desktop/test.xml"
with open(path, 'rb') as fp:
    content = fp.read()

parser = ET.XMLParser(encoding="utf-8")
tree = ET.fromstring(content, parser=parser)