如何计算python中单词的文本实例数?

时间:2018-03-17 08:45:07

标签: python nltk text-mining stemming lemmatization

我有一个亚马逊评论数据集,如下有3个变量[user_id,product_id,review_text]

评论中有多少单词有词“rec”(说推荐,接收等包括他们的时态)以及有多少评论有“推荐+产品”(如推荐产品,收到产品等)

import nltk
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.corpus import wordnet
import csv
import numpy as np
import pandas as pd

df = pd.read_csv('Reviews.csv', sep=',')
print(df.head, ",")
Reviews = df[['user_id', 'product_id', 'review_text']] #user_id is unique, while there are 5 products of each type

with open ('Reviews.csv') as fin:
    for line in fin:
        tokens = word_tokenize(line)
        print(tokens)

我为所有文字做了标记化。如何从这里开始?

1 个答案:

答案 0 :(得分:0)

如果您只想计算" rec"和"推荐+产品"你可以使用正则表达式:

import re

found_0 = 'say recommend, receive etc including their tenses' 
found_1 = 'recommend product, received product etc'

print(re.findall('.*rec.*', found_0))
# this will print the array: 
# ['say recommend, receive etc including their tenses']

print(re.findall('rec[^ ]* product', found_1))
# this will print the array:
# ['recommend product', 'received product']

然后你只需用len计算数组的大小( insert_array_here