Question

我有一个单词列表，并且有很大的序列，我想将列表中的每个单词与每个单词出现在多少行中进行比较。

def example(word_list, Series):
   df['0'].value_counts()

据我所知，以上仅计算单词总数，而不计算列表中每个单词出现的行数。例子

第6行

房屋-2行

Answer 1

尝试这样的事情：

import pandas as pd
import numpy as np

data = np.array(['hello friend','this','is Anna coming?','hello there!'])

ser = pd.Series(data)


my_l = ['hello', 'is']
d = {}
for word in my_l:

  count = 0

  for s in ser:
    if (' ' + word + ' ') in (' ' + s + ' '):
      count = count +1
  d[word] = count

print (d)

输出

{'hello': 2, 'is': 1}

Answer 2

leaflet-routing-error

即使'the'出现了3次，但只有2行出现了，所以输出为2

Answer 3

使用@Ram设置：

df = pd.DataFrame(columns=['data'], data=['what are you doing', 'give me the the file', 'the sun comes up up', 'you and me'])
word_list = ['the', 'up', 'me']

df['data'].str.split(expand=True).stack().groupby(level=0)\
 .apply(lambda x: x.drop_duplicates().value_counts())\
 .sum(level=1)[word_list]

输出：

the    2
up     1
me     2
dtype: int64

或使用@Alex设置：

data = np.array(['hello friend','this','is Anna coming?','hello there!'])

ser = pd.Series(data)

my_l = ['hello', 'is']

ser.str.split(expand=True).stack().groupby(level=0)\
     .apply(lambda x: x.drop_duplicates().value_counts())\
     .sum(level=1)[my_l]

输出：

hello    2
is       1
dtype: int64

Answer 4

简单地说：

{word:series.str.contains(word).sum() for word in word_list}吗？

比较单词列表与系列

4 个答案: