Question

有一个DataFrame列df['Title']，其中每一行都是在某个位置LOCATION_ID出售的图书。我希望按df对LOCATION_ID进行分组，并创建一个新的DataFrame，其中包含两列：LOCATION_ID和Title-Count每个位置销售的图书字典。

具体来说，我尝试做类似的事情：

from collections import Counter
new_df = df.groupby('LOCATION_ID')['TITLE'].apply(lambda x: Counter(x))

我期待输出如下：

LOCATION_ID  |     TITLES
1                 {'TitleA':12; 'TitleB':56 ; ...}
2                 {'TitleK':5; 'TitleC':23 ; ...}
...

但相反，我收到的是：

LOCATION_ID                         Title                             
1               TitleA               12
                TitleB               56
...
2               TitleK              5
                TitleG              23
...

感谢您的帮助。

Answer 1

使用agg代替apply：

import numpy as np
import pandas as pd
from collections import Counter
prng = np.random.RandomState(0)
df = pd.DataFrame({'LOCATION_ID': prng.choice([1, 2, 3], 1000), 'TITLE': [''.join(prng.choice(list("abcd"), 3)) for _ in range(1000)]})
df.head()
Out[9]: 
   LOCATION_ID TITLE
0            1   bbb
1            2   bab
2            1   daa
3            2   dcd
4            2   cbc

df.groupby('LOCATION_ID')['TITLE'].apply(lambda x: Counter(x)).head()
Out[10]: 
LOCATION_ID     
1            aaa    2.0
             aab    5.0
             aac    4.0
             aad    3.0
             aba    8.0
dtype: float64

df.groupby('LOCATION_ID')['TITLE'].agg(lambda x: Counter(x))
Out[11]: 
LOCATION_ID
1    {u'cbb': 5, u'cbc': 8, u'cba': 6, u'cda': 8, u...
2    {u'cdd': 5, u'cbc': 7, u'cbb': 1, u'cba': 4, u...
3    {u'cbb': 6, u'cbc': 7, u'cba': 4, u'cda': 6, u...
Name: TITLE, dtype: object

你的期望是有道理的。将项目组合在一起时，您希望pandas返回分组结果。但是，groupby.apply被记录为flexible apply。基于返回的对象，它推断出如何组合结果。在这里，它会看到一个字典并为您提供更好的输出，它会创建一个多索引系列。

将Pandas列的每个单元格从列表转换为字数统计字典？

1 个答案: