Pandas:检查B列

时间:2018-05-18 06:03:59

标签: python pandas

我在df1中有100个关键字,在df2中有10,000个文章。我想计算有多少文章包含某个关键字。例如,大约有20篇文章包含关键字“apple”。

我尝试使用df.str.contains(),但我必须计算每个关键字。你能告诉我一个有效的方法吗?

df1=pd.DataFrame(['apple','mac','pc','ios','lg'],columns=['keywords'])


df2=pd.DataFrame(['apple is good for health','mac is another pc','today is sunday','Star wars pc game','ios is a system,lg is not','lg is a japan company '],columns=['article'])

结果:

1 artricl contain "apple"
1 article contain 'mac'
2 article contain 'pc'
1 article contain "ios"
2 article contain 'lg'

1 个答案:

答案 0 :(得分:2)

对于所有sum使用{{True s,我认为需要str.contains对于计数为1的布尔系列keywords list comprehension 1}}与DataFrame构造函数:

L = [(x, df2['article'].str.contains(x).sum()) for x in df1['keywords']]
#alternative solution
#L = [(x, sum(x in article for article in df2['article'])) for x in df1['keywords']]
df3 = pd.DataFrame(L, columns=['keyword', 'count'])
print (df3)
  keyword  count
0   apple      1
1     mac      1
2      pc      2
3     ios      1
4      lg      2

如果只想打印输出:

for x in df1['keywords']:
    count =  df2['article'].str.contains(x).sum()
    #another solution if no NaNs with sum, generator and check membership by in
    #count =  sum(x in article for article in df2['article'])
    print ('{} article contain "{}"'.format(count, x))

1 article contain "apple"
1 article contain "mac"
2 article contain "pc"
1 article contain "ios"
2 article contain "lg"