我有一个这样的数据框:
| col1 | d
-------------------------------------------
0 | A | {'student99': [[0.83, "nice"]]}
1 | B | {'student99': [[0.84, "great"], [0.89, "cool"]]}
2 | C | {'student98': [[0.85, "amazing"]], 'student97': [[0.9, "crazy"]]}
并且我正在尝试转换为以下数据框:
| col1 | student | grade | comment
---------------------------------------------
0 | A | student99| 0.83 | nice
1 | B | student99| 0.84 | great
2 | B | student99| 0.89 | cool
3 | C | student98| 0.85 | amazing
4 | C | student97| 0.9 | crazy
就像您看到的一样,我需要将d
列拆分为student
,grade
和comment
列,并且需要按数字将行拆分为某些行d
列(如上面的C行)中的键的数量和,按每个键(如上面的B行)的列表数。
我该怎么做?
在评论之后,我注意到数据以JSON格式到达了下一种格式(我将其转换为数据框):
{"A": {"d" : {'student99': [[0.83, "nice"]]}},
"B": {"d" : {'student99': [[0.84, "great"], [0.89, "cool"]]},
"C": {"d" : {'student98': [[0.85, "amazing"]], 'student97': [[0.9, "crazy"]]}
}
答案 0 :(得分:5)
我们可以对explode
进行pd.Series
,然后将数据帧join
重新创建
s=df.pop('d').apply(pd.Series).stack().explode()
df_add=pd.DataFrame({'student':s.index.get_level_values(1),
'grade':s.str[0].values,
'comment':s.str[1].values},
index=s.index.get_level_values(0))
df=df.join(df_add,how='right')
df
Out[169]:
col1 student grade comment
0 A student99 0.83 nice
1 B student99 0.84 great
1 B student99 0.89 cool
2 C student98 0.85 amazing
2 C student97 0.90 crazy
答案 1 :(得分:1)
@YOBEN_S的解决方案很棒;这是一种寻求更快解决方案的尝试:
from itertools import product, chain
#chain.... is long... flatten is shorter
#and still gets the point accross
flatten = chain.from_iterable
#flatten the product of each key,value pair
#in the dictionary
m = [(first, flatten(product([key], value) for key, value in last.items()))
for first, last in emp]
#flatten again
phase2 = flatten(product(first, last) for first, last in m)
#at this point we have
#the column entry("A","B",...)
#and the flattened entries in the dict
#so, unpacking solves this
phase3 = [(first,second, *last) for first, (second,last) in phase2]
result = pd.DataFrame(phase3, columns = ["col1","student","grade","comment"])
result
col1 student grade comment
0 A student99 0.83 nice
1 B student99 0.84 great
2 B student99 0.89 cool
3 C student98 0.85 amazing
4 C student97 0.90 crazy