Question

在我的Python作业中，我的任务是：“编写一个完整的python程序，读取文件trash.txt并输出Bob在文件中出现的次数。”

我的代码是：

count=0
f=open('trash.txt','r')
bob_in_trash=f.readlines()
for line in bob_in_trash:
    if "Bob" in line:
        count=count+1
print(count)
f.close()

有没有办法让这段代码更有效率？它正确计算了5，但我想知道是否有任何我可以修改的内容。

Answer 1

您只需阅读整个文件并计算“Bob”的名字：

data = open('trash.txt').read()
count = data.count('Bob')

虽然这对于较小的文件更准确，但是当您处理更大的文件时，将整个文件加载到内存可能会有问题。

逐行阅读它仍然更有效，但使用str.count代替Bob in line（这会让您阅读其中有多少行“Bob”）。

with open('trash.txt') as f:
    for line in f:
        count += line.count("Bob")

Answer 2

这种方式你总是每行计算一个“Bob”...如何使用count方法，这样你就可以对每行的任意次数求和：

for line in bob_in_trash:
    count=count+line.count("Bob")

Answer 3

要获得更多功能，请使用正则表达式来区分bob，Bob，bobcat等。

import re
with open('trash.txt','r') as f:
   count = sum(len(re.findall( r'\bbob\b', line)) for line in f)

选项：

r'\bbob\b'      # matches bob
r'(?i)\bbob\b'  # matches bob, Bob
r'bob'          # matches bob, Bob, bobcat

Answer 4

>>> count = 0
>>> abuffer = bytearray(4096)
>>> with open('trash.txt') as fp:
...    while fp.readinto(abuffer) > 0:
...        count += abuffer.count('Bob')

Answer 5

因为你只需要整个单词，所以最好使用正则表达式：

i = 0
with open('trash.txt','r') as file:
    for result in re.finditer(r'\bBob\b', file.read()):
        i += 1
print('Number of Bobs in file: ' + str(i))

请注意，正则表达式为\bBob\b，其中\b位于开头和结尾表示Bob必须是单词，而不是单词的一部分。另外，我使用finditer代替find，因为前者对大文件使用的内存要少得多。

要节省更多内存，请结合逐行阅读：

i = 0
with open('trash.txt','r') as file:
    for line in file:
        for result in re.finditer(r'\bBob\b', line):
            i += 1
print('Number of Bobs in file: ' + str(i))

一个单词出现在文件中的次数是多少？

5 个答案:

选项：