计算一个数据框中的一列在另一数据框中的单词的唯一出现次数

时间:2018-10-09 11:47:41

标签: python-3.x pandas dataframe

我有一个熊猫数据框df,其字符串列为Posts,如下所示:

df['Posts']
0       this is an example sentence
1       this too is an example too is an example sentence
2       yup, still an example sentence

我还有另一个数据框df1,其列Phrases中有标签列表,如下所示:

df1['Phrases']
0       example
1       example sentence
2       is an
3       is an example
4       yup

我需要一个在Phrases的{​​{1}}中出现的df1df唯一计数的数据框,如下所示:

Posts

1 个答案:

答案 0 :(得分:2)

使用str.extract,然后按sum检查非缺失值并计数出现次数-True类似于1 s的过程:

df1['Count'] = [df['Posts'].str.extract('(' + x + ')', expand=False).notnull().sum()
                     for x in df1['Phrases']]
print (df1)
               Tags  Count
0           example      3
1  example sentence      3
2             is an      2
3     is an example      2
4               yup      1

编辑:

对于不计算partail匹配的单词,请使用单词边界:

df1['Count'] = [df['Posts'].str.extract(r'(\b' + x + r'\b)', expand=False).notnull().sum()
                     for x in df1['Phrases']]
print (df1)
            Phrases  Count
0           example      3
1  example sentence      3
2             is an      2
3     is an example      2
4               yup      1