我的问题是,一开始该程序似乎已损坏。它只显示
[(1,'C:\ Users \ .... \ Desktop \ Sense_and_Sensibility.txt')]
一遍又一遍,永无止境。
(注意: ....出于发布目的而被替换,因为我的计算机用户名是我的全名)。
我不确定我是否完全不正确地编写了此代码,或者在打开文件时是否遇到问题。任何帮助表示赞赏。
程序应:
1 :打开文件,将所有标点符号替换为空格,将所有单词更改为小写,然后将其存储在字典中。
2 :查看将从原始词典中删除的单词(停用词)列表。
3 :对剩余单词进行计数并根据频率进行排序。
fname = r"C:\Users\....\Desktop\Sense_and_Sensibility.txt" # file to read
swfilename = r"C:\Users\....\Desktop\stopwords.txt" # words to delete
with open(fname) as file: # have the program run the file
for line in file: # loop through
fname.replace('-.,"!?', " ") # replace punc. with space
words = fname.lower() # make all words lowercase
word_list = fname.split() # separate the words, store
word_dict = {} # create a dictionary
with open(swfilename) as delete: # open stop word list
for line in delete:
sw_list = swfilename.split() # separate the words, store them
sw_dict = {}
for key in sw_dict:
word_dict.pop(key, None) # delete common words
for word in word_list: # loop through
word_dict[word] = word_dict.get(word, 0) + 1 # count frequency
word_freq = [] # create index
for key, value in word_dict.items(): # count occurrences
word_freq.append((value, key)) # append freq list
word_freq.sort(reverse=True) # sort the words by freq
print(word_freq) # print most to least
答案 0 :(得分:0)
与Mac和Linux OS相比,使用python在Windows中导入文件有一些不同
只需更改fname = r"C:\Users\....\Desktop\Sense_and_Sensibility.txt"
中的文件路径
致fname = "C:\\Users\\....\\Desktop\\Sense_and_Sensibility.txt"
使用双斜杠
答案 1 :(得分:0)
您的代码有几个问题。鉴于无法复制您的确切观察结果,因为读者无法访问您正在使用的输入,因此我只讨论最明显的一个。
我将首先逐字报告您的代码,并用???
标记薄弱点,后跟一个数字,我将在代码后面加以说明。
fname = r"C:\Users\....\Desktop\Sense_and_Sensibility.txt" #file to read
swfilename = r"C:\Users\....\Desktop\stopwords.txt" #words to delete
with open(fname) as file: #???(1) have the program run the file
for line in file: #loop through
fname.replace ('-.,"!?', " ") #???(2) replace punc. with space
words = fname.lower() #???(3) make all words lowercase
word_list = fname.split() #separate the words, store
word_dict = {} #???(4) create a dictionary
with open(swfilename) as delete: #open stop word list
for line in delete:
sw_list = swfilename.split() #separate the words, store them
sw_dict = {}
for key in sw_dict:
word_dict.pop(key, None) #???(5) delete common words
for word in word_list: #???(6) loop through
word_dict[word] = word_dict.get(word, 0) + 1 #???(7) count frequency
word_freq = [] #???(8)create index
for key, value in word_dict.items(): #count occurrences
word_freq.append((value, key)) #append freq list
word_freq.sort(reverse = True) #sort the words by freq
print(word_freq) #print most to least
file
是Python中的保留字,这是一种很好的做法,不要在执行操作时用于自定义目的.replace()
会将左侧的确切字符串替换为右侧的确切字符串,但是您要做的是执行某种multi_replace()
,您可以实现自己(例如作为函数),例如循环(或使用.replace()
)连续调用functools.reduce()
。fname
包含文件名(实际上是路径),而不包含您要使用的文件的内容。word_list
和word_dict
,则将在每次迭代时“覆盖”内容。另外,word_dict
被创建为空并且从不填充。filtered_list
从word_list
创建一个stop_words
。然后可以使用字典来实现计数器。我确实了解您可能需要学习如何实现计数器,但是请记住,标准库中的模块collections.Counter()
(因此可以使用import collections
进行访问)完全可以满足您的要求。dictionary[key]
既可用于访问(您不执行),也可用于(您执行)写入与字典中特定键相关联的值。key
和.sort()
的参数sorted()
。希望这会有所帮助!