将多个词典添加到单个Dataframe大熊猫中

时间:2019-03-06 11:01:29

标签: pandas

我有一组通过for循环获得的python字典。我正在尝试将这些添加到Pandas Dataframe中。

名为output的变量的输出

{'name':'Kevin','age':21}
{'name':'Steve','age':31}
{'name':'Mark','age':11}

我正在尝试将每个字典附加到单个Dataframe中。我尝试执行以下操作,但它仅添加了第一行。

df = pd.DataFrame(output)

任何人都可以建议哪里出了问题,并将所有词典添加到数据框中。

更新循环语句

以下代码有助于读取xml并将其转换为数据框。现在,我看到我能够遍历多个xml文件并为每个xml文件创建字典。我试图看看如何将这些字典中的每一个添加到单个Dataframe中:

def f(elem, result):
    result[elem.tag] = elem.text
    cs = elem.getchildren()
    for c in cs:
        result = f(c, result)
    return result

    result = {}
    for file in allFiles:
        tree = ET.parse(file)
        root = tree.getroot()
        result = f(root, result)
        print(result)

2 个答案:

答案 0 :(得分:1)

您可以将每个字典追加到列表中,并最后调用DataFrame构造函数:

out = []
for file in allFiles:
    tree = ET.parse(file)
    root = tree.getroot()
    result = f(root, result)
    out.append(result)

df = pd.DataFram(out)

答案 1 :(得分:1)

We can add these dicts to a list:

ds = []
for ...:      # your loop
    ds += [d] # where d is one of the dicts

When we have the list of dicts, we can simply use pd.DataFrame on that list:

ds = [
    {'name':'Kevin','age':21},
    {'name':'Steve','age':31},
    {'name':'Mark','age':11}
]
pd.DataFrame(ds)

Output:

    name  age
0  Kevin   21
1  Steve   31
2   Mark   11

Update: And it's not a problem if different dicts have different keys, e.g.:

ds = [
    {'name':'Kevin','age':21},
    {'name':'Steve','age':31,'location': 'NY'},
    {'name':'Mark','age':11,'favorite_food': 'pizza'}
]
pd.DataFrame(ds)

Output:

   age favorite_food location   name
0   21           NaN      NaN  Kevin
1   31           NaN       NY  Steve
2   11         pizza      NaN   Mark

Update 2: Building up on our previous discussion in Python - Converting xml to csv using Python pandas we can do:

results = []
for file in glob.glob('*.xml'):
    tree = ET.parse(file)
    root = tree.getroot()
    result = f(root, {})
    result['filename'] = file # added filename to our results
    results += [result]

pd.DataFrame(results)