如何在文本文件中搜索用户输入的单词列表?

时间:2014-11-23 03:20:56

标签: python word-count word-frequency

我试图在Python 3.4.1中创建一个简单的单词计数器程序,其中用户输入一个逗号分隔的单词列表,然后在示例文本文件中对频率进行分析。

我目前只关注如何在文本文件中搜索输入的单词列表。

首先我尝试了:

file = input("What file would you like to open? ")
f = open(file, 'r')
search = input("Enter the words you want to search for (separate with commas): ").lower().split(",")
search = [x.strip(' ') for x in search]
count = {}
for word in search:
    count[word] = count.get(word,0)+1
for word in sorted(count):
    print(word, count[word])

这导致:

What file would you like to open? twelve_days_of_fast_food.txt
Enter the words you want to search for (separate with commas): first, rings, the
first 1
rings 1
the 1

如果要做的事情,我猜这个方法只给了我输入列表中单词的计数,而不是文本文件中输入单词列表的计数。所以我试过了:

file = input("What file would you like to open? ")
f = open(file, 'r')
lines = f.readlines()
line = f.readline()
word = line.split()
search = input("Enter the words you want to search for (separate with commas): ").lower().split(",")
search = [x.strip(' ') for x in search]
count = {}
for word in lines:
    if word in search:
        count[word] = count.get(word,0)+1
for word in sorted(count):
    print(word, count[word])

这没有给我任何回报。这就是发生的事情:

What file would you like to open? twelve_days_of_fast_food.txt
Enter the words you want to search for (separate with commas): first, the, rings
>>> 

我做错了什么?我该如何解决这个问题?

3 个答案:

答案 0 :(得分:1)

您首先阅读所有行(进入lines,然后尝试只阅读一行,但该文件已经为您提供了所有行。在这种情况下,f.readline()会为您提供从那里开始,你的剧本注定要失败;你不能在空行中计算单词。

您可以改为循环文件:

file = input("What file would you like to open? ")

search = input("Enter the words you want to search for (separate with commas): ")
search = [word.strip() for word in search.lower().split(",")]

# create a dictionary for all search words, setting each count to 0
count = dict.fromkeys(search, 0)

with open(file, 'r') as f:
    for line in f:
        for word in line.lower().split():
            if word in count:
                # found a word you wanted to count, so count it
                count[word] += 1

with语句使用打开的文件对象作为上下文管理器;这只是意味着它在完成后会自动再次关闭。

for line in f:循环遍历输入文件中的每个单独行;这比使用f.readlines()一次将所有行读入内存更有效。

我还清理了你的搜索词,然后将count词典设置为一个,其中所有搜索词都预先定义为0;这使得实际计算更容易一些。

因为你现在有一个包含所有搜索词的字典,所以最好对该字典进行匹配词的测试。对字典进行测试比对列表进行测试要快(后者是扫描,列表中的单词越多,字典测试需要更长的时间,而字典测试平均需要不变的时间,而不管字典中的项目数量。) / p>

答案 1 :(得分:1)

你可以试试这个;

import re
import collections

wanted = ["cat", "dog"]
matches = re.findall('\w+',open('hamlet.txt').read().lower())
counts = collections.Counter(matches) # Count each occurance of words
map(lambda x:(x,counts[x]),wanted) # Will print the counts for wanted words

在形成答案时我引用了this solution

答案 2 :(得分:0)

希望这可以帮助你运转它

string = "once upon atime"
string2 = "hello pig upon"
word = string.split()
word2 = string2.split()
match = True

while match:

    match = False 
    for X in range(0, len(word)):
        for Y in range(0, len(word)):
            if word[X] == word2[Y]:
                print(word[X])

                match = True

    break #match = False