如何获取python中数据框中数据列上关键字列表的出现次数

时间:2017-07-25 05:21:59

标签: python pandas dataframe data-analysis

 my_list=["one","is"]

 df
 Out[6]:
        Name    Story
   0    Kumar   Kumar is one of the great player in his team
   1    Ravi    Ravi is a good poet
   2    Ram     Ram drives well

如果my_list中的任何项目出现在"故事"列我需要得到所有项目的no。

 my_desired_output

 new_df
 word     count
 one       1
 is        2

我实现了使用

提取my_list中包含任何项目的行
mask=df1["Story"].str.contains('|'.join(my_list),na=False) but now I am trying get the counts of each word in my_list

1 个答案:

答案 0 :(得分:1)

您可以先str.split stack使用Series字词{/ p>}:

a = df['Story'].str.split(expand=True).stack()
print (a)
0  0     Kumar
   1        is
   2       one
   3        of
   4       the
   5     great
   6    player
   7        in
   8       his
   9      team
1  0      Ravi
   1        is
   2         a
   3      good
   4      poet
2  0       Ram
   1    drives
   2      well
dtype: object

然后使用boolean indexingisin进行过滤,获取value_counts,然后按DataFrame添加rename_axisreset_index

df = a[a.isin(my_list)].value_counts().rename_axis('word').reset_index(name='count')
print (df)
  word  count
0   is      2
1  one      1

另一个解决方案是按str.split创建所有单词的列表,然后按from_iterable翻译,使用Counter,最后按构造函数创建DataFrame

from collections import Counter
from  itertools import chain

my_list=["one","is"]

a = list(chain.from_iterable(df['Story'].str.split().values.tolist()))
print (a)
['Kumar', 'is', 'one', 'of', 'the', 'great', 'player', 
 'in', 'his', 'team', 'Ravi', 'is', 'a', 'good', 'poet', 'Ram', 'drives', 'well']

b = Counter([x for x in a if x in my_list])
print (b)
Counter({'is': 2, 'one': 1})

df = pd.DataFrame({'word':list(b.keys()),'count':list(b.values())}, columns=['word','count'])
print (df)
  word  count
0  one      1
1   is      2