我试图在Python 3.4.1中创建一个简单的单词计数器程序,其中用户输入一个逗号分隔的单词列表,然后在示例文本文件中对频率进行分析。
我目前只关注如何在文本文件中搜索输入的单词列表。
首先我尝试了:
file = input("What file would you like to open? ")
f = open(file, 'r')
search = input("Enter the words you want to search for (separate with commas): ").lower().split(",")
search = [x.strip(' ') for x in search]
count = {}
for word in search:
count[word] = count.get(word,0)+1
for word in sorted(count):
print(word, count[word])
这导致:
What file would you like to open? twelve_days_of_fast_food.txt
Enter the words you want to search for (separate with commas): first, rings, the
first 1
rings 1
the 1
如果要做的事情,我猜这个方法只给了我输入列表中单词的计数,而不是文本文件中输入单词列表的计数。所以我试过了:
file = input("What file would you like to open? ")
f = open(file, 'r')
lines = f.readlines()
line = f.readline()
word = line.split()
search = input("Enter the words you want to search for (separate with commas): ").lower().split(",")
search = [x.strip(' ') for x in search]
count = {}
for word in lines:
if word in search:
count[word] = count.get(word,0)+1
for word in sorted(count):
print(word, count[word])
这没有给我任何回报。这就是发生的事情:
What file would you like to open? twelve_days_of_fast_food.txt
Enter the words you want to search for (separate with commas): first, the, rings
>>>
我做错了什么?我该如何解决这个问题?
答案 0 :(得分:1)
您首先阅读所有行(进入lines
,然后尝试只阅读一行,但该文件已经为您提供了所有行。在这种情况下,f.readline()
会为您提供从那里开始,你的剧本注定要失败;你不能在空行中计算单词。
您可以改为循环文件:
file = input("What file would you like to open? ")
search = input("Enter the words you want to search for (separate with commas): ")
search = [word.strip() for word in search.lower().split(",")]
# create a dictionary for all search words, setting each count to 0
count = dict.fromkeys(search, 0)
with open(file, 'r') as f:
for line in f:
for word in line.lower().split():
if word in count:
# found a word you wanted to count, so count it
count[word] += 1
with
语句使用打开的文件对象作为上下文管理器;这只是意味着它在完成后会自动再次关闭。
for line in f:
循环遍历输入文件中的每个单独行;这比使用f.readlines()
一次将所有行读入内存更有效。
我还清理了你的搜索词,然后将count
词典设置为一个,其中所有搜索词都预先定义为0
;这使得实际计算更容易一些。
因为你现在有一个包含所有搜索词的字典,所以最好对该字典进行匹配词的测试。对字典进行测试比对列表进行测试要快(后者是扫描,列表中的单词越多,字典测试需要更长的时间,而字典测试平均需要不变的时间,而不管字典中的项目数量。) / p>
答案 1 :(得分:1)
你可以试试这个;
import re
import collections
wanted = ["cat", "dog"]
matches = re.findall('\w+',open('hamlet.txt').read().lower())
counts = collections.Counter(matches) # Count each occurance of words
map(lambda x:(x,counts[x]),wanted) # Will print the counts for wanted words
在形成答案时我引用了this solution。
答案 2 :(得分:0)
string = "once upon atime"
string2 = "hello pig upon"
word = string.split()
word2 = string2.split()
match = True
while match:
match = False
for X in range(0, len(word)):
for Y in range(0, len(word)):
if word[X] == word2[Y]:
print(word[X])
match = True
break #match = False