Question

我正在尝试打印出文件中最常用单词的列表。但是，我也试图忽略常用词。我目前已编写此代码

import csv
import collections
from collections import Counter

with open('billboardtop1002015lyrics.txt',encoding='ISO-8859-1') as csv_file:
mostcommonword = []

counter = Counter(csv_file.read().strip().split())

commonwords = (counter.most_common(30))

ignore_words = ['i','you','me','the','that','on','is','when','if','in','dont','for','when']

 if commonwords not in ignore_words:
    mostcommonword.append(commonwords)
    print(mostcommonword)

这是行不通的，我得到的输出是单词“ i”，“ you”等。我对python非常陌生，这是我正在从事的第一个项目。

有什么我想念的东西吗？

谢谢！

Answer 1

您应该首先消除被忽略的单词，然后找到最常见的单词。

      date    v1    v2   v3    v4
0  2017-01  12.0   1.0  7.0   7.0
1  2017-02   2.0  13.0  5.0  56.0
2  2017-03  15.0   3.0  6.0   9.0
3  2017-04  12.0  14.0  8.0   0.0
4  2017-05   8.0   8.0  4.0   6.0

尝试从文件中删除常用词

1 个答案: