使用apply方法将Python Pandas数据帧列中的元组列表转换为列表字典

时间:2018-05-16 15:44:16

标签: pandas dataframe

我有一个输入数据框,如下所示:

df_in = pd.DataFrame({'Name':['Isha', 'Amy', 'Ann'], 'Classes':[\
                   [('Mon','Math'),('Mon','Science'),('Tue','English')],\
                   [('Mon','Math'),('Wed','Science'),('Tue','English')],\
                   [('Mon','Math'),('Wed','Science'),('Wed','English')]\
                   ]})

我需要一个输出数据框,如下所示:

df_out = pd.DataFrame({'Name':['Isha', 'Amy', 'Ann'], 'Classes':[\
                   {'Mon':['Math','Science'],'Tue':['English']},\
                   {'Mon':['Math'],'Wed':['Science'],'Tue':['English']},\
                   {'Mon':['Math'],'Wed':['Science','English']}\
                   ]})

我们可以写任何功能' fun'与df_in['Classes'].apply(fun)类似的应用 会将Classes列更新为df_out中的格式吗?我尝试使用defaultdict等...但无法写一个。谢谢!

2 个答案:

答案 0 :(得分:3)

你可以这样使用defaultdict

from collections import defaultdict

def tuple_to_dict(tuples):
    d = defaultdict(list)

    for k, v in tuples:
        d[k].append(v)
    return d

df_in['Classes'] = df_in['Classes'].apply(tuple_to_dict)

df_in
#                                             Classes  Name
#0  {u'Mon': [u'Math', u'Science'], u'Tue': [u'Eng...  Isha
#1  {u'Tue': [u'English'], u'Mon': [u'Math'], u'We...   Amy
#2  {u'Mon': [u'Math'], u'Wed': [u'Science', u'Eng...   Ann

mapapply的时间安排:

df = pd.concat([df_in] * 10000)

%timeit df['Classes'].apply(tuple_to_dict)
# 41.2 ms ± 872 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df['Classes'].map(tuple_to_dict)
# 39.2 ms ± 945 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

答案 1 :(得分:1)

使用itertools

import itertools as it

df_in['New']=[{k: list(x[1] for x in v) for k, v in it.groupby(sorted(y), key=lambda x: x[0])}for y in df_in.Classes]
df_in
Out[607]: 
                                         Classes  Name                                                New
0  [(Mon, Math), (Mon, Science), (Tue, English)]  Isha   {'Mon': ['Math', 'Science'], 'Tue': ['English']}
1  [(Mon, Math), (Wed, Science), (Tue, English)]   Amy  {'Mon': ['Math'], 'Tue': ['English'], 'Wed': [...
2  [(Mon, Math), (Wed, Science), (Wed, English)]   Ann   {'Mon': ['Math'], 'Wed': ['English', 'Science']}