my_list=["one","is"]
df
Out[6]:
Name Story
0 Kumar Kumar is one of the great player in his team
1 Ravi Ravi is a good poet
2 Ram Ram drives well
如果my_list中的任何项目出现在"故事"列我需要得到所有项目的no。
my_desired_output
new_df
word count
one 1
is 2
我实现了使用
提取my_list中包含任何项目的行mask=df1["Story"].str.contains('|'.join(my_list),na=False) but now I am trying get the counts of each word in my_list
答案 0 :(得分:1)
您可以先str.split
stack
使用Series
字词{/ p>}:
a = df['Story'].str.split(expand=True).stack()
print (a)
0 0 Kumar
1 is
2 one
3 of
4 the
5 great
6 player
7 in
8 his
9 team
1 0 Ravi
1 is
2 a
3 good
4 poet
2 0 Ram
1 drives
2 well
dtype: object
然后使用boolean indexing
按isin
进行过滤,获取value_counts
,然后按DataFrame添加rename_axis
和reset_index
:
df = a[a.isin(my_list)].value_counts().rename_axis('word').reset_index(name='count')
print (df)
word count
0 is 2
1 one 1
另一个解决方案是按str.split
创建所有单词的列表,然后按from_iterable
翻译,使用Counter
,最后按构造函数创建DataFrame
:
from collections import Counter
from itertools import chain
my_list=["one","is"]
a = list(chain.from_iterable(df['Story'].str.split().values.tolist()))
print (a)
['Kumar', 'is', 'one', 'of', 'the', 'great', 'player',
'in', 'his', 'team', 'Ravi', 'is', 'a', 'good', 'poet', 'Ram', 'drives', 'well']
b = Counter([x for x in a if x in my_list])
print (b)
Counter({'is': 2, 'one': 1})
df = pd.DataFrame({'word':list(b.keys()),'count':list(b.values())}, columns=['word','count'])
print (df)
word count
0 one 1
1 is 2