如何'解包'pandas dataframe专栏

时间:2017-04-08 16:42:25

标签: python pandas

我有:

import numpy as np
import pandas as pd
data = np.array([['id','Date','Pers', 'Comb'],
                ['Row1','12-12-2016', 'John', [{"name":"asr","value":"no"},{"name":"flt","value":"641"},{"name":"dest","value":"lax"}]],
                ['Row2','24-12-2016', 'Pete', [{"name":"asr","value":"yes"},{"name":"flt","value":"751"},{"name":"dest","value":"nbo"}]],
                ['Row2','25-12-2016', 'Sue', [{"name":"asr","value":"no"},{"name":"flt","value":"810"},{"name":"dest","value":"tyo"}]]])
df_org = (pd.DataFrame(data=data[1:,1:],
                  index=data[1:,0],
                  columns=data[0,1:]))
df_org

        Date        Pers    Comb
Row1    12-12-2016  John    [{u'name': u'asr', u'value': u'no'}, {u'name':...
Row2    24-12-2016  Pete    [{u'name': u'asr', u'value': u'yes'}, {u'name'...
Row2    25-12-2016  Sue [{u'name': u'asr', u'value': u'no'}, {u'name':...

我尝试从Comb列中提取名称值,并使每个名称也成为一个新列。像这样:

data = np.array([['id','Date','Pers', 'asr', 'flt', 'dest'],
                ['Row1','12-12-2016', 'John', "no", "641", "lax"],
                ['Row2','24-12-2016', 'Pete', "yes","751","nbo"],
                ['Row2','25-12-2016', 'Sue', "no","810","tyo"]])
df_new = (pd.DataFrame(data=data[1:,1:],
                  index=data[1:,0],
                  columns=data[0,1:]))
df_new

        Date        Pers    asr flt dest
Row1    12-12-2016  John    no  641 lax
Row2    24-12-2016  Pete    yes 751 nbo
Row2    25-12-2016  Sue no  810 tyo

我尝试通过'解包'Comb列:

for i in range(0, len(df)):
    rowdf = pd.DataFrame(json.loads(df.iloc[i]['Comb']))
    print rowdf['name'], rowdf['value']

但后来我不知道如何将新数据框附加到原始行。

请帮助从df_org转到df_new? 谢谢

1 个答案:

答案 0 :(得分:1)

使用您的虚拟数据,您可以执行以下操作:

def iterate(values):
    return pd.Series({x["name"]: x["value"] for x in values})

pd.concat([df_org, df_org.pop("Comb").apply(iterate)], axis=1)

        Date        Pers    asr     dest    flt
Row1    12-12-2016  John    no      lax     641
Row2    24-12-2016  Pete    yes     nbo     751
Row2    25-12-2016  Sue     no      tyo     810
  • 辅助函数iterate从字典中提取值,同时将它们作为pandas Series返回。
  • apply与返回Series对象的函数一起使用时,结果将被转换为pandas DataFrame。
  • pop返回给定列并将其从数据框中删除。
  • concat最终合并您的来源df(不包含comb列)和Comb提取的值

编辑Json数据

如果您的comb列中有json字符串,则可以通过json.loads将它们转换为常规python对象。只需将iterate功能更改为:

即可
import json

def iterate(values):
    return pd.Series({x["name"]: x["value"] for x in json.loads(values)})