Question

我有一些关键字的列表，让我们说。

searchterms = 
['Java language',
 'C Language',
 'C++',
 'Python Language',
 'JavaScript',
 'Pascal Language',
 'Statistics',
 'Visual Basic',
 'Objective-C',
 'MATLAB']

我有一个pandas数据框，由文本和作者组成，每个作者的文本语料库，即单个文本条目很长，我们称之为df。

现在，我想创建一个规则。我的主要目标是将用户分为三类，即用户是初级/中级/高级。我的想法是搜索这些搜索词的计数，并为每个词创建单独的列并表示计数。如果创建的任何搜索词变量具有以下条件，我可以将其标记为beginner / intermediate / advanced。

any_serch_term < 10 - beginner
any_serch_term > 10 and < 50  - intermediate
any_serch_term > 50  - advanced

df的样本

user      text
a     is home to one of the largest collections of ...
b     Commercial Floor Plans       Sully Statio...
c    I had no idea the issues a code of conduct It’...
d    My week on Twitter  1 Retweet 824K Retweet Re...
e     Tory MPs who guzzle £3000000 in taxpayer fund...

输出df，表示计数。

user      text      Java languageC      Language      C++      Python Language ........

感谢。

Answer 1

for terms in search_terms:
    df[str(terms)] = map(lambda x: x.count(str(terms)), df['text'])

计算pandas df中单词的出现次数

1 个答案: