我的项目非常基础......我需要获取gettysburg地址的文本文件,并计算单词数和唯一单词数。我已经完成了一直到最后但它的重复计算单词与首都首字母相同 - 即但是但是。我不知道如何解决这个问题:(这是我到目前为止所做的:
def main():
getty = open('Gettysburgaddress.txt','r')
lines = getty.readlines()
getty.close()
index = 0
while index < len(lines):
lines[index] = lines[index].rstrip('\n')
index += 1
words = [word for line in lines for word in line.split()]
size = len(words)
print('\nThere are', size,'words in the Gettysburg Address.\n')
unique = list(set(words))
size_unique = len(unique)
print('There are', size_unique,'unique words in the Gettysburg Address.\n')
unique.sort()
print('Sorted order of unique words:', unique)
close = input('')
main()
答案 0 :(得分:3)
收集单词时小写单词:
words = [word.lower() for line in lines for word in line.split()]
或创建一组唯一字词时:
unique = list(set(word.lower() for word in words))
您可以稍微简化文件加载代码:
with open('Gettysburgaddress.txt','r') as getty:
words = [word.lower() for line in getty for word in line.split()]
这会在一个步骤中将文件加载到较低的单词列表中,其中with
语句也会负责再次关闭文件。