Question

作为一个例子，假设我有一个如下的python pandas DataFrame：

#  PERSON  THINGS
0  Joe     Candy Corn, Popsicles
1  Jane    Popsicles
2  John    Candy Corn, Ice Packs
3  Lefty   Ice Packs, Hot Dogs

我想使用熊猫 groupby 功能获得以下输出：

THINGS        COUNT
Candy Corn    2
Popsicles     2
Ice Packs     2
Hot Dogs      1

我通常理解以下 groupby 命令：

df.groupby(['THINGS']).count()

但是输出不是按单个项而是按整个字符串。我想我理解为什么会这样，但是我不清楚如何最好地解决问题以获得所需的输出，而不是以下内容：

THINGS                  PERSON
Candy Corn, Ice Packs   1
Candy Corn, Popsicles   1
Ice Packs, Hot Dogs     1
Popsicles               1

熊猫在SQL中是否具有类似于 Like 的功能，或者我正在考虑如何在熊猫中做到这一点？

任何帮助表示赞赏。

Answer 1

通过拆分单词来创建系列，然后使用value_counts

In [292]: pd.Series(df.THINGS.str.cat(sep=', ').split(', ')).value_counts()
Out[292]:
Popsicles     2
Ice Packs     2
Candy Corn    2
Hot Dogs      1
dtype: int64

Answer 2

您需要将THINGS除以,并展平序列和计数值。

pd.Series([item.strip() for sublist in df['THINGS'].str.split(',') for item in sublist]).value_counts()

输出：

Candy Corn    2
Popsicles     2
Ice Packs     2
Hot Dogs      1
dtype: int64

单个DataFrame列python / pandas中的groupby逗号分隔值

2 个答案: