我有:
import numpy as np
import pandas as pd
data = np.array([['id','Date','Pers', 'Comb'],
['Row1','12-12-2016', 'John', [{"name":"asr","value":"no"},{"name":"flt","value":"641"},{"name":"dest","value":"lax"}]],
['Row2','24-12-2016', 'Pete', [{"name":"asr","value":"yes"},{"name":"flt","value":"751"},{"name":"dest","value":"nbo"}]],
['Row2','25-12-2016', 'Sue', [{"name":"asr","value":"no"},{"name":"flt","value":"810"},{"name":"dest","value":"tyo"}]]])
df_org = (pd.DataFrame(data=data[1:,1:],
index=data[1:,0],
columns=data[0,1:]))
df_org
Date Pers Comb
Row1 12-12-2016 John [{u'name': u'asr', u'value': u'no'}, {u'name':...
Row2 24-12-2016 Pete [{u'name': u'asr', u'value': u'yes'}, {u'name'...
Row2 25-12-2016 Sue [{u'name': u'asr', u'value': u'no'}, {u'name':...
我尝试从Comb列中提取名称值,并使每个名称也成为一个新列。像这样:
data = np.array([['id','Date','Pers', 'asr', 'flt', 'dest'],
['Row1','12-12-2016', 'John', "no", "641", "lax"],
['Row2','24-12-2016', 'Pete', "yes","751","nbo"],
['Row2','25-12-2016', 'Sue', "no","810","tyo"]])
df_new = (pd.DataFrame(data=data[1:,1:],
index=data[1:,0],
columns=data[0,1:]))
df_new
Date Pers asr flt dest
Row1 12-12-2016 John no 641 lax
Row2 24-12-2016 Pete yes 751 nbo
Row2 25-12-2016 Sue no 810 tyo
我尝试通过'解包'Comb列:
for i in range(0, len(df)):
rowdf = pd.DataFrame(json.loads(df.iloc[i]['Comb']))
print rowdf['name'], rowdf['value']
但后来我不知道如何将新数据框附加到原始行。
请帮助从df_org转到df_new? 谢谢
答案 0 :(得分:1)
使用您的虚拟数据,您可以执行以下操作:
def iterate(values):
return pd.Series({x["name"]: x["value"] for x in values})
pd.concat([df_org, df_org.pop("Comb").apply(iterate)], axis=1)
Date Pers asr dest flt
Row1 12-12-2016 John no lax 641
Row2 24-12-2016 Pete yes nbo 751
Row2 25-12-2016 Sue no tyo 810
iterate
从字典中提取值,同时将它们作为pandas Series返回。 apply
与返回Series对象的函数一起使用时,结果将被转换为pandas DataFrame。 pop
返回给定列并将其从数据框中删除。concat
最终合并您的来源df(不包含comb
列)和Comb
提取的值如果您的comb
列中有json字符串,则可以通过json.loads
将它们转换为常规python对象。只需将iterate
功能更改为:
import json
def iterate(values):
return pd.Series({x["name"]: x["value"] for x in json.loads(values)})