Question

我正在编写 Python for Informatics 一书中的练习，它要求我编写一个程序来模拟UNIX上grep命令的操作。但是，我的代码不起作用。在这里，我简化了我的代码，并且只打算计算以“查找”一词开头的行数。我很困惑，希望你可以点亮它。

from urllib.request import urlopen
import re

fhand = urlopen('http://www.py4inf.com/code/mbox-short.txt')
sumFind = 0

for line in fhand:
    line = str(line) #convert from byte to string for re operation
    if re.search('^From',line) is not None:
        sumFind+=1

print(f'There are {sumFind} lines that match.')

脚本的输出是

有0条线匹配。

这是输入文本的链接： text

非常感谢你的时间。

Answer 1

错误是使用str将字节转换为字符串。

>>> str(b'foo')
"b'foo'"

你需要

line = line.decode()

但最好的方法是将字节正则表达式传递给正则表达式，这是受支持的：

for line in fhand:
    if re.search(b'^From',line) is not None:
        sumFind+=1

现在我得到54场比赛。

请注意，您可以将整个循环简化为：

sum_find = sum(bool(re.match(b'From',line)) for line in fhand)

re.match取代了^与搜索
无需循环，sum计算re.match返回真值的时间（显式转换为bool，以便它可以加0或1）

甚至更简单，没有正则表达式：

sum_find = sum(line.startswith(b"From") for line in fhand)

Answer 2

问题在于urllib模块从url / text文件返回字节而不是字符串。

你可以：

在正则表达式搜索中使用字节：re.search（b'From'，line）。
使用请求模块将文件下载为字符串并按行分割：

导入请求

txt = requests.get（'http://www.py4inf.com/code/mbox-short.txt'）。text.split（'\ n'）

表示txt中的行： ...

python中的正则表达式不起作用

2 个答案: