Question

我有2个csv文件，dictionary.csv和story.csv。我想计算story.csv中每行有多少个单词与dictionary.csv中的单词匹配

下面是被截断的示例

Story.csv 
id    STORY
0     Jennie have 2 shoes, a red heels and a blue sneakers
1     The skies are pretty today
2     One of aesthetic color is grey

Dictionary.csv
red
green
grey
blue
black

我期望的输出是

output.csv
id    STORY                                                  Found
0     Jennie have 2 shoes, a red heels and a blue sneakers    2
1     The skies are pretty today                              0
2     One of aesthetic color is grey                          1

这些是我到目前为止拥有的代码，但是我只有NaN（空单元格）

import pandas as pd 
import csv

news=pd.read_csv("Story.csv") 
dictionary=pd.read_csv("Dictionary.csv")


news['STORY'].value_counts()

news['How many found in 1'] = dictionary['Lists'].map(news['STORY'].value_counts())

news.to_csv("output.csv")

我也尝试使用.str.count，但是我一直保持为零

Answer 1

尝试一下

import pandas as pd

#create the sample data frame
data = {'id':[0,1,2],'STORY':['Jennie have 2 shoes, a red heels and a blue sneakers',\
'The skies are pretty today',\
'One of aesthetic color is grey']}

word_list = ['red', 'green', 'grey', 'blue', 'black']

df = pd.DataFrame(data)

#start counting
df['Found'] = df['STORY'].astype(str).apply(lambda t: pd.Series({word: t.count(word) for word in word_list}).sum())
#alternatively, can use this
#df['Found'] = df['STORY'].astype(str).apply(lambda t: sum([t.count(word) for word in word_list]))

输出

df
#   id  STORY                                                Found
#0  0   Jennie have 2 shoes, a red heels and a blue sneakers 2
#1  1   The skies are pretty today                           0
#2  2   One of aesthetic color is grey                       1

奖金编辑：如果您想查看按字数统计的详细细分，请运行

df['STORY'].astype(str).apply(lambda t: pd.Series({word: t.count(word) for word in word_list}))

#   red     green   grey    blue    black
#0  1       0       0       1       0
#1  0       0       0       0       0
#2  0       0       1       0       0

如何计算2个CSV文件中的匹配单词

1 个答案: