如何从嵌套XML创建Pandas DataFrame

时间:2019-04-26 05:43:41

标签: python xml pandas

我试图用XML创建Pandas数据框。 XML看起来像这样:

<?xml version="1.0" encoding="utf-8"?>
<Products>
    <Info>
        <Msg>Shop items.</Msg>
    </Info>
    <shop shopNr="01">
        <ItemNr>1001</ItemNr>
        <ItemNr>1002</ItemNr>
        <ItemNr>1003</ItemNr>
        <ItemNr>1004</ItemNr>
                <ItemNr>1010</ItemNr>
    </shop>
    <shop shopNr="02">
        <ItemNr>1002</ItemNr>
        <ItemNr>1006</ItemNr>
        <ItemNr>1005</ItemNr>
    </shop>
    <shop shopNr="03">
        <ItemNr>1009</ItemNr>
        <ItemNr>1006</ItemNr>
        <ItemNr>1005</ItemNr>
        <ItemNr>1002</ItemNr>
    </shop>
</Products>

我尝试使用XML Etree作为下面的代码。我有两个问题。

首先,我无法获得ItemNr的值作为根的子级。而不是获得价值,即。 1001,我得到

<Element 'ItemNr' at 0x000001E2D6C41B38>.

第二个问题是当我从列表中创建数据框时。我最终有一个项目列表清单。尽管由于无法获取上述值,结果现在为空,但我想以一个扁平化的列表结束。

import xml.etree.ElementTree as ET
import pandas as pd
data = 'example_shops.xml'
tree = ET.parse(data)
root = tree.getroot()

shops = []
items = []
for node in root.iter('shop'):
    shops.append(node.attrib.get('shopNr'))
    items.append(list(node))

d = {'shops': shops, 'items': items}
df = pd.DataFrame(d)

产生的DataFrame。

 shops                 items
0    01  [[], [], [], [], []]
1    02          [[], [], []]
2    03      [[], [], [], []]

所需的输出是:


 shops                 items
0    01  [1001, 1002, 1003, 1004, 1010]
1    02          [1002, 1006, 1005]
2    03      [1009, 1006, 1005, 1002]

2 个答案:

答案 0 :(得分:2)

您想将fromEvent(document.getElementById('locationSearchInput'), 'input').pipe( debounceTime(750), distinctUntilChanged(), map((eventObj: Event) => (<HTMLInputElement>eventObj.target).value), switchMap((term: string) => this.cityService.getLocation(term)) ).subscribe() 元素下的ItemNr元素的文本值附加到项目列表,而不是您正在执行的xml Element python对象。

以下代码对我有用:

shop

答案 1 :(得分:1)

我希望这是预期的输出:

import xml.etree.ElementTree as ET
import pandas as pd
data = 'example_shops.xml'
tree = ET.parse(data)
root = tree.getroot()
shops_items = []
all_shops_items = []
for ashop in root.iter('shop'):
    items = []
    shop_Nr = ashop.attrib.get('shopNr')
    for anitem in ashop.iter('ItemNr'):
        items.append(anitem.text)
    shops_items = [shop_Nr,items]
    all_shops_items.append(shops_items)
df = pd.DataFrame(all_shops_items,columns=['SHOP_NUMBER','ITEM_NUMBER'])        
print(df)

输出:

  SHOP_NUMBER                     ITEM_NUMBER
0          01  [1001, 1002, 1003, 1004, 1010]
1          02              [1002, 1006, 1005]
2          03        [1009, 1006, 1005, 1002]

如果您想购买带有单个物品的商店:

import xml.etree.ElementTree as ET
import pandas as pd
data = 'example_shops.xml'
tree = ET.parse(data)
root = tree.getroot()
shops_items = []
all_shops_items = []
for ashop in root.iter('shop'):
    shop_Nr = ashop.attrib.get('shopNr')
    for anitem in ashop.iter('ItemNr'):
        item_Nr = anitem.text
        shops_items = [shop_Nr,item_Nr]
        all_shops_items.append(shops_items)
df = pd.DataFrame(all_shops_items,columns=['SHOP_NUMBER','ITEM_NUMBER'])        
print(df)

输出:

   SHOP_NUMBER ITEM_NUMBER
0           01        1001
1           01        1002
2           01        1003
3           01        1004
4           01        1010
5           02        1002
6           02        1006
7           02        1005
8           03        1009
9           03        1006
10          03        1005
11          03        1002