我的程序打开一个文件,它可以对其中包含的单词进行字数统计,但我想创建一个由文本中所有唯一单词组成的字典 例如,如果“计算机”一词出现三次,我希望将其视为一个独特的单词
def main():
file = input('Enter the name of the input file: ')
infile = open(file, 'r')
file_contents = infile.read()
infile.close()
words = file_contents.split()
number_of_words = len(words)
print("There are", number_of_words, "words contained in this paragarph")
main()
答案 0 :(得分:2)
使用一套。这只会包含唯一的字词:
words = set(words)
如果你不关心案件,你可以这样做:
words = set(word.lower() for word in words)
这假设没有标点符号。如果有,你需要删除标点符号。
import string
words = set(word.lower().strip(string.punctuation) for word in words)
如果您需要跟踪每个单词的数量,请在上面的示例中将set
替换为Counter
:
import string
from collections import Counter
words = Counter(word.lower().strip(string.punctuation) for word in words)
这将为您提供一个类似字典的对象,告诉您每个单词有多少。
您还可以从中获取唯一单词的数量(如果您关心的话,它会更慢):
import string
from collections import Counter
words = Counter(word.lower().strip(string.punctuation) for word in words)
nword = len(words)
答案 1 :(得分:0)
@TheBlackCat他的解决方案有效但只能给你字符串/文件中有多少独特单词。此解决方案还会显示它出现的次数。
dictionaryName = {}
for word in words:
if word not in list(dictionaryName):
dictionaryName[word] = 1
else:
number = dictionaryName.get(word)
dictionaryName[word] = dictionaryName.get(word) + 1
print dictionaryName
测试:
words = "Foo", "Bar", "Baz", "Baz"
output: {'Foo': 1, 'Bar': 1, 'Baz': 2}
答案 2 :(得分:0)
可能更清洁,更快速的解决方案:
words_dict = {}
for word in words:
word_count = words_dict.get(word, 0)
words_dict[word] = word_count + 1