我已经从数据框中创建了单词列表,并从其中删除了停用词。 我想创建频率大于某个值n的单词列表。 我该怎么做。
Here is my code to generate the list:
tokenizer = RegexpTokenizer(r"\w+(?:[-']\w+)?")
wineData['description'] = wineData['description'].apply(lambda x:
str.lower(x))
wineDataTokenized = wineData['description'].apply(lambda x: [el for el in
tokenizer.tokenize(x) if el not in stop_words])
filteredList = chain.from_iterable(wineDataTokenized)
frequencyList = FreqDist(filteredList)
highFreq = list(frequencyList.keys())
答案 0 :(得分:0)
wordstring = 'it was the best of times it was the worst of times '
wordstring += 'it was the age of wisdom it was the age of foolishness'
wordlist = wordstring.split()
wordfreq = []
for w in wordlist:
wordfreq.append(wordlist.count(w))
print("String\n" + wordstring +"\n")
print("List\n" + str(wordlist) + "\n")
print("Frequencies\n" + str(wordfreq) + "\n")
print("Pairs\n" + str(zip(wordlist, wordfreq)))
来源:https://programminghistorian.org/en/lessons/counting-frequencies