从pandas数据帧中提取列

时间:2018-02-05 16:33:29

标签: python python-3.x pandas dictionary

如何从包含字典的后续pandas df中提取列作为pandas df :(我需要带有索引的'name'的所有值)

nutrients_df
Out[63]: 
                                          nutrients
0     [{'code': '203cp1252', 'name': 'Proteincp1252'...
1     [{'code': '203cp1252', 'name': 'Proteincp1252'...
2     [{'code': '203cp1252', 'name': 'Proteincp1252'...
3     [{'code': '203cp1252', 'name': 'Proteincp1252'...
4     [{'code': '203cp1252', 'name': 'Proteincp1252'...
5     [{'code': '203cp1252', 'name': 'Proteincp1252'...
6     [{'code': '203cp1252', 'name': 'Proteincp1252'...

“nutrient_df”被定义为json数据库中的pandas df,如下所示:

nutrient_name=[]
for index, row in data_df.iterrows():
    nutrients1 = row['nutrients']
    nutrients.append(nutrients1)    
    nutrients_df = pd.DataFrame({'nutrients': nutrients})

1 个答案:

答案 0 :(得分:1)

我不确定您的df.nutrients系列中存在哪种数据类型。以下是一些如何从类似字典的对象中提取name的示例。

import pandas as pd
from ast import literal_eval

# If your columns are genuine dictionaries
df = pd.DataFrame([[{'code': '203cp1252', 'name': 'Proteincp1252'}],
                   [{'code': '203cp1252', 'name': 'Proteincp1253'}],
                   [{'code': '203cp1252', 'name': 'Proteincp1254'}]],
                  columns=['nutrients'])

df['name'] = df['nutrients'].apply(lambda x: x['name'])

# If your column is a string
df = pd.DataFrame([["{'code': '203cp1252', 'name': 'Proteincp1252'}"],
                   ["{'code': '203cp1252', 'name': 'Proteincp1253'}"],
                   ["{'code': '203cp1252', 'name': 'Proteincp1254'}"]],
                  columns=['nutrients'])

df['name'] = df['nutrients'].apply(lambda x: literal_eval(x)['name'])