我已经处理了一些非常复杂的嵌套json对象,以获得以下通用字典格式:
{'key1':'value1',
'key2':'value2',
'key3':'value3',
'key4':'value4',
'key5':[['value5', 'value6', 'value7'], ['value8', 'value9', 'value10']],
'key6':[['value5', 'value6', 'value7'], ['value8', 'value9', 'value10']]}
在列表列表中,每个列表都表示应该是"个别交易"当量。每个事务共享key1,key2,key3,key4对。可以有任意数量的列表。我试图有效地将这些转换为熊猫数据框中的记录,如下所示:
key1_field, key2_field, key3_field, key4_field, key5_or_key6_field_1, key5_or_key6_field_2, key5_or_key6_field_3, key5_or_key6_indicator
value1, value2, value3, value 4, value5, value6, value7, key5
value1, value2, value3, value 4, value5, value6, value7, key6
value1, value2, value3, value 4, value8, value9, value10, key5
value1, value2, value3, value 4, value8, value9, value10, key6
真诚地感谢任何帮助!到目前为止,这已成为一个挑战。谢谢!
修改
如我所知,我可以发布我一直试图解决的问题:
import pandas as pd
import numpy as np
d = {'key1':'value1',
'key2':'value2',
'key3':'value3',
'key4':'value4',
'key5':[['value5', 'value6', 'value7'], ['value8', 'value9', 'value10']],
'key6':[['value5', 'value6', 'value7'], ['value8', 'value9', 'value10']]}
df = pd.DataFrame({k : pd.Series(v) for k, v in d.iteritems()})
我的剩余问题是单个键值在第一行之后是NaN。
答案 0 :(得分:2)
一种选择是按原样读取字典并重新整形数据框:
df = pd.DataFrame({'key1':'value1',
'key2':'value2',
'key3':'value3',
'key4':'value4',
'key5':[['value5', 'value6', 'value7'], ['value8', 'value9', 'value10']],
'key6':[['value5', 'value6', 'value7'], ['value8', 'value9', 'value10']]})
df.set_index(['key1', 'key2', 'key3', 'key4']).stack().apply(pd.Series) \
.rename(columns = lambda x: "value_" + str(x)).reset_index()
# key1 key2 key3 key4 level_4 value_0 value_1 value_2
# 0 value1 value2 value3 value4 key5 value5 value6 value7
# 1 value1 value2 value3 value4 key6 value5 value6 value7
# 2 value1 value2 value3 value4 key5 value8 value9 value10
# 3 value1 value2 value3 value4 key6 value8 value9 value10
答案 1 :(得分:1)
试试这个:
pd.DataFrame({k : pd.Series(v) for k, v in d.iteritems()}).ffill()