如何计算2个CSV文件中的匹配单词

时间:2019-10-13 09:19:38

标签: python-3.x pandas csv

我有2个csv文件,dictionary.csv和story.csv。我想计算story.csv中每行有多少个单词与dictionary.csv中的单词匹配

下面是被截断的示例

Story.csv 
id    STORY
0     Jennie have 2 shoes, a red heels and a blue sneakers
1     The skies are pretty today
2     One of aesthetic color is grey
Dictionary.csv
red
green
grey
blue
black

我期望的输出是

output.csv
id    STORY                                                  Found
0     Jennie have 2 shoes, a red heels and a blue sneakers    2
1     The skies are pretty today                              0
2     One of aesthetic color is grey                          1

这些是我到目前为止拥有的代码,但是我只有NaN(空单元格)

import pandas as pd 
import csv

news=pd.read_csv("Story.csv") 
dictionary=pd.read_csv("Dictionary.csv")


news['STORY'].value_counts()

news['How many found in 1'] = dictionary['Lists'].map(news['STORY'].value_counts())

news.to_csv("output.csv")

我也尝试使用.str.count,但是我一直保持为零

1 个答案:

答案 0 :(得分:1)

尝试一下

import pandas as pd

#create the sample data frame
data = {'id':[0,1,2],'STORY':['Jennie have 2 shoes, a red heels and a blue sneakers',\
'The skies are pretty today',\
'One of aesthetic color is grey']}

word_list = ['red', 'green', 'grey', 'blue', 'black']

df = pd.DataFrame(data)

#start counting
df['Found'] = df['STORY'].astype(str).apply(lambda t: pd.Series({word: t.count(word) for word in word_list}).sum())
#alternatively, can use this
#df['Found'] = df['STORY'].astype(str).apply(lambda t: sum([t.count(word) for word in word_list]))

输出

df
#   id  STORY                                                Found
#0  0   Jennie have 2 shoes, a red heels and a blue sneakers 2
#1  1   The skies are pretty today                           0
#2  2   One of aesthetic color is grey                       1

奖金编辑:如果您想查看按字数统计的详细细分,请运行

df['STORY'].astype(str).apply(lambda t: pd.Series({word: t.count(word) for word in word_list}))

#   red     green   grey    blue    black
#0  1       0       0       1       0
#1  0       0       0       0       0
#2  0       0       1       0       0