Question

我有一个熊猫数据框df，其字符串列为Posts，如下所示：

df['Posts']
0       this is an example sentence
1       this too is an example too is an example sentence
2       yup, still an example sentence

我还有另一个数据框df1，其列Phrases中有标签列表，如下所示：

df1['Phrases']
0       example
1       example sentence
2       is an
3       is an example
4       yup

我需要一个在Phrases的{{1}}中出现的df1中df唯一计数的数据框，如下所示：

Posts

Answer 1

使用str.extract，然后按sum检查非缺失值并计数出现次数-True类似于1 s的过程：

df1['Count'] = [df['Posts'].str.extract('(' + x + ')', expand=False).notnull().sum()
                     for x in df1['Phrases']]
print (df1)
               Tags  Count
0           example      3
1  example sentence      3
2             is an      2
3     is an example      2
4               yup      1

编辑：

对于不计算partail匹配的单词，请使用单词边界：

df1['Count'] = [df['Posts'].str.extract(r'(\b' + x + r'\b)', expand=False).notnull().sum()
                     for x in df1['Phrases']]
print (df1)
            Phrases  Count
0           example      3
1  example sentence      3
2             is an      2
3     is an example      2
4               yup      1

计算一个数据框中的一列在另一数据框中的单词的唯一出现次数

1 个答案: