嗨,我有一个与下面类似的熊猫df
information record
name apple
size {'weight':{'gram':300,'oz':10.5},'description':{'height':10,'width':15}}
country America
partiesrelated [{'nameOfFarmer':'John Smith'},{'farmerID':'A0001'}]
我想将df转换成这样的另一个df
information record
name apple
size_weight_gram 300
size_weight_oz 10.5
size_description_height 10
size_description_width 15
country America
partiesrelated_nameOfFarmer John Smith
partiesrelated_farmerID A0001
在这种情况下,字典将解析为size_weight_gram
所在的单行并包含值。
df
df = pd.DataFrame({'information': ['name', 'size', 'country', 'partiesrealated'],
'record': ['apple', {'weight':{'gram':300,'oz':10.5},'description':{'height':10,'width':15}}, 'America', [{'nameOfFarmer':'John Smith'},{'farmerID':'A0001'}]]})
df = df.set_index('information')
答案 0 :(得分:2)
IIUC,您可以定义一个递归函数以嵌套您的序列/字典,直到您获得一个键列表为止,该键列表既可以用作pd.DataFrame
构造函数的有效输入,又可以按照您描述的方式进行格式化。
看看这个解决方案:
import itertools
import collections
ch = lambda ite: list(itertools.chain.from_iterable(ite))
def isseq(obj):
if isinstance(obj, str): return False
return isinstance(obj, collections.abc.Sequence)
def unnest(k, v):
if isseq(v): return ch([unnest(k, v_) for v_ in v])
if isinstance(v, dict): return ch([unnest("_".join([k, k_]), v_) for k_, v_ in v.items()])
return k,v
def pairwise(i):
_a = iter(i)
return list(zip(_a, _a))
a = ch([(unnest(k, v)) for k, v in zip(d['information'], d['record'])])
pd.DataFrame(pairwise(a))
0 1
0 name apple
1 size_weight_gram 300
2 size_weight_oz 10.5
3 size_description_height 10
4 size_description_width 15
5 country America
6 partiesrealated_nameOfFarmer John Smith
7 partiesrealated_farmerID A0001
由于该解决方案的递归性质,该算法将使您的嵌套深度达到最大。例如:
d={
'information': [
'row1',
'row2',
'row3',
'row4'
],
'record': [
'val1',
{
'val2': {
'a': 300,
'b': [
{
"b1": 10.5
},
{
"b2": 2
}
]
},
'val3': {
'a': 10,
'b': 15
}
},
'val4',
[
{
'val5': [
{
'a': {
'c': [
{
'd': {
'e': [
{
'f': 1
},
{
'g': 3
}
]
}
}
]
}
}
]
},
{
'b': 'bar'
}
]
]
}
0 1
0 row1 val1
1 row2_val2_a 300
2 row2_val2_b_b1 10.5
3 row2_val2_b_b2 2
4 row2_val3_a 10
5 row2_val3_b 15
6 row3 val4
7 row4_val5_a_c_d_e_f 1
8 row4_val5_a_c_d_e_g 3
9 row4_b bar