Question

我正在尝试一次读取一个文件，1mb，然后从正则表达式返回数据。这是我正在使用的代码，但我没有收到任何回复，我知道文件中存在这些字符：

if __name__ == "__main__":
    import sys
    import re
    filename = sys.argv[1]
    def filemessage(filename, chunk_size=1000000):
        while True:
            data = filename.read(chunk_size)
            if not data: break
            yield data  
            regex = re.compile('abc')
            if regex.findall(data) == True:   
                print (regex)
            else:                   
                continue

有什么建议吗？感谢。

Answer 1

re.findall会返回匹配文字的列表。这永远不会等于True，因为list != bool：

>>> [1, 2, 3] == True
False
>>>

只需删除== True：

即可

if regex.findall(data):

事实上，像这样的问题正是PEP 0008谴责在if语句条件下执行== True或== False的做法的原因。

但请注意，您丢弃了re.findall返回的列表。也许你打算这样做：

if regex.match(data): # Matches from start of string
# or
if regex.search(data): # Matches anywhere in string

修改

您的函数运行速度非常快，因为它是一个生成器（它产生data）。这意味着，在您开始耗尽生成器之前，Python实际上不会在filemessage内运行代码。因此，您只计算构建生成器所需的时间，而不是执行它。

您可以使用for循环耗尽发电机：

if __name__ == "__main__": import sys import re filename = sys.argv[1] def filemessage(filename, chunk_size=1000000): while True: data = filename.read(chunk_size) if not data: break yield data regex = re.compile('abc') if regex.search(data): print (regex) for data in filemessage(...): # do something with data

一次读取原始文件1mb，然后在Python中执行Regex

1 个答案: