我有一个输入数据框,如下所示:
df_in = pd.DataFrame({'Name':['Isha', 'Amy', 'Ann'], 'Classes':[\
[('Mon','Math'),('Mon','Science'),('Tue','English')],\
[('Mon','Math'),('Wed','Science'),('Tue','English')],\
[('Mon','Math'),('Wed','Science'),('Wed','English')]\
]})
我需要一个输出数据框,如下所示:
df_out = pd.DataFrame({'Name':['Isha', 'Amy', 'Ann'], 'Classes':[\
{'Mon':['Math','Science'],'Tue':['English']},\
{'Mon':['Math'],'Wed':['Science'],'Tue':['English']},\
{'Mon':['Math'],'Wed':['Science','English']}\
]})
我们可以写任何功能' fun'与df_in['Classes'].apply(fun)
类似的应用
会将Classes列更新为df_out中的格式吗?我尝试使用defaultdict
等...但无法写一个。谢谢!
答案 0 :(得分:3)
你可以这样使用defaultdict
:
from collections import defaultdict
def tuple_to_dict(tuples):
d = defaultdict(list)
for k, v in tuples:
d[k].append(v)
return d
df_in['Classes'] = df_in['Classes'].apply(tuple_to_dict)
df_in
# Classes Name
#0 {u'Mon': [u'Math', u'Science'], u'Tue': [u'Eng... Isha
#1 {u'Tue': [u'English'], u'Mon': [u'Math'], u'We... Amy
#2 {u'Mon': [u'Math'], u'Wed': [u'Science', u'Eng... Ann
map
与apply
的时间安排:
df = pd.concat([df_in] * 10000)
%timeit df['Classes'].apply(tuple_to_dict)
# 41.2 ms ± 872 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit df['Classes'].map(tuple_to_dict)
# 39.2 ms ± 945 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
答案 1 :(得分:1)
使用itertools
import itertools as it
df_in['New']=[{k: list(x[1] for x in v) for k, v in it.groupby(sorted(y), key=lambda x: x[0])}for y in df_in.Classes]
df_in
Out[607]:
Classes Name New
0 [(Mon, Math), (Mon, Science), (Tue, English)] Isha {'Mon': ['Math', 'Science'], 'Tue': ['English']}
1 [(Mon, Math), (Wed, Science), (Tue, English)] Amy {'Mon': ['Math'], 'Tue': ['English'], 'Wed': [...
2 [(Mon, Math), (Wed, Science), (Wed, English)] Ann {'Mon': ['Math'], 'Wed': ['English', 'Science']}