Question

我有这样的df：

import pandas as pd

df = pd.DataFrame(columns=['Concat','SearchTerm'])
df = df.append({'Concat':'abc','SearchTerm':'aa'}, ignore_index=True)
df = df.append({'Concat':'abc','SearchTerm':'aab'}, ignore_index=True)
df = df.append({'Concat':'abc','SearchTerm':'aac'}, ignore_index=True)
df = df.append({'Concat':'abc','SearchTerm':'ddd'}, ignore_index=True)
df = df.append({'Concat':'def','SearchTerm':'cef'}, ignore_index=True)
df = df.append({'Concat':'def','SearchTerm':'plo'}, ignore_index=True)
df = df.append({'Concat':'def','SearchTerm':'cefa'}, ignore_index=True)

print(df)
  Concat SearchTerm
0    abc         aa
1    abc        aab
2    abc        aac
3    abc        ddd
4    def        cef
5    def        plo
6    def       cefa

我想按Concat对df进行分组，并计算每个SearchTerm在该子集的字符串中出现多少次。因此，最终结果应如下所示：

  Concat SearchTerm Count
0    abc         aa     3
1    abc        aab     1
2    abc        aac     1
3    abc        ddd     1
4    def        cef     2
5    def        plo     1
6    def       cefa     1

对于Concat abc，在4个搜索词中发现aa 3次。我可以使用循环来获取解决方案，但是对于我较大的数据集而言，它太慢了。

我已经尝试过这个thread和这个thread的两个解决方案。

df['Count'] = df['SearchTerm'].str.contains(df['SearchTerm']).groupby(df['Concat']).sum()
df['Count'] = df.groupby(['Concat'])['SearchTerm'].transform(lambda x: x[x.str.contains(x)].count())

在任何一种情况下，都会出现TypeError：

'Series'对象是可变的，因此不能被散列

任何帮助将不胜感激。

Answer 1

使用transform和listcomp

s = df.groupby('Concat').SearchTerm.transform('|'.join)
df['Count'] = [s[i].count(term) for i, term in enumerate(df.SearchTerm)]

Out[77]:
  Concat SearchTerm  Count
0    abc         aa      3
1    abc        aab      1
2    abc        aac      1
3    abc        ddd      1
4    def        cef      2
5    def        plo      1
6    def       cefa      1

熊猫-分组和计数列上的系列字符串

1 个答案: