Question

我有一个大熊猫系列列表，其中包含单词集合。我正在尝试查找每个列表中特定单词的频率。例如，该系列是

0    [All, of, my, kids, have, cried, nonstop, when...
1    [We, wanted, to, get, something, to, keep, tra...
2    [My, daughter, had, her, 1st, baby, over, a, y...
3    [One, of, babys, first, and, favorite, books, ...
4    [Very, cute, interactive, book, My, son, loves...

我想得到每一行中的孩子数量。我试过了

series.count('kids')

这给了我一个错误，说'等级孩子必须和名字一样（无）'

series.str.count('kids)

给我NaN值。

我该如何计算？

Answer 1

使用

In [5288]: series.apply(lambda x: x.count('kids'))
Out[5288]:
0    1
1    0
2    0
3    0
4    0
Name: s, dtype: int64

详细

In [5292]: series
Out[5292]:
0    [All, of, my, kids, have, cried, nonstop, when]
1    [We, wanted, to, get, something, to, keep, tra]
2    [My, daughter, had, her, 1st, baby, over, a, y]
3      [One, of, babys, first, and, favorite, books]
4    [Very, cute, interactive, book, My, son, loves]
Name: s, dtype: object

In [5293]: type(series)
Out[5293]: pandas.core.series.Series

In [5294]: type(series[0])
Out[5294]: list

Answer 2

在原始系列中，使用str.findall + str.len：

print(series)   

0     All of my kids have cried nonstop when
1     We wanted to get something to keep tra
2      My daughter had her 1st baby over a y
3      One of babys first and favorite books
4    Very cute interactive book My son loves

print(series.str.findall(r'\bkids\b'))

0    [kids]
1        []
2        []
3        []
4        []
dtype: object

counts = series.str.findall(r'\bkids\b').str.len()
print(counts)

0    1
1    0
2    0
3    0
4    0
dtype: int64

在pandas系列中使用字符串的计数出现次数

2 个答案: