Question

我有：

import numpy as np
import pandas as pd
data = np.array([['id','Date','Pers', 'Comb'],
                ['Row1','12-12-2016', 'John', [{"name":"asr","value":"no"},{"name":"flt","value":"641"},{"name":"dest","value":"lax"}]],
                ['Row2','24-12-2016', 'Pete', [{"name":"asr","value":"yes"},{"name":"flt","value":"751"},{"name":"dest","value":"nbo"}]],
                ['Row2','25-12-2016', 'Sue', [{"name":"asr","value":"no"},{"name":"flt","value":"810"},{"name":"dest","value":"tyo"}]]])
df_org = (pd.DataFrame(data=data[1:,1:],
                  index=data[1:,0],
                  columns=data[0,1:]))
df_org

        Date        Pers    Comb
Row1    12-12-2016  John    [{u'name': u'asr', u'value': u'no'}, {u'name':...
Row2    24-12-2016  Pete    [{u'name': u'asr', u'value': u'yes'}, {u'name'...
Row2    25-12-2016  Sue [{u'name': u'asr', u'value': u'no'}, {u'name':...

我尝试从Comb列中提取名称值，并使每个名称也成为一个新列。像这样：

data = np.array([['id','Date','Pers', 'asr', 'flt', 'dest'],
                ['Row1','12-12-2016', 'John', "no", "641", "lax"],
                ['Row2','24-12-2016', 'Pete', "yes","751","nbo"],
                ['Row2','25-12-2016', 'Sue', "no","810","tyo"]])
df_new = (pd.DataFrame(data=data[1:,1:],
                  index=data[1:,0],
                  columns=data[0,1:]))
df_new

        Date        Pers    asr flt dest
Row1    12-12-2016  John    no  641 lax
Row2    24-12-2016  Pete    yes 751 nbo
Row2    25-12-2016  Sue no  810 tyo

我尝试通过'解包'Comb列：

for i in range(0, len(df)):
    rowdf = pd.DataFrame(json.loads(df.iloc[i]['Comb']))
    print rowdf['name'], rowdf['value']

但后来我不知道如何将新数据框附加到原始行。

请帮助从df_org转到df_new？谢谢

Answer 1

使用您的虚拟数据，您可以执行以下操作：

def iterate(values):
    return pd.Series({x["name"]: x["value"] for x in values})

pd.concat([df_org, df_org.pop("Comb").apply(iterate)], axis=1)

        Date        Pers    asr     dest    flt
Row1    12-12-2016  John    no      lax     641
Row2    24-12-2016  Pete    yes     nbo     751
Row2    25-12-2016  Sue     no      tyo     810

辅助函数iterate从字典中提取值，同时将它们作为pandas Series返回。
将apply与返回Series对象的函数一起使用时，结果将被转换为pandas DataFrame。
pop返回给定列并将其从数据框中删除。
concat最终合并您的来源df（不包含comb列）和Comb提取的值

编辑Json数据

如果您的comb列中有json字符串，则可以通过json.loads将它们转换为常规python对象。只需将iterate功能更改为：

即可

import json

def iterate(values):
    return pd.Series({x["name"]: x["value"] for x in json.loads(values)})

如何'解包'pandas dataframe专栏

1 个答案:

编辑Json数据