Question

我正在尝试在当前目录中的所有* .txt文件下运行脚本。目前，它将仅处理test.txt文件并基于正则表达式打印文本块。扫描当前目录中* .txt文件并在所有找到的* .txt文件下运行脚本下最快捷的方法是什么？另外我如何在当前脚本中包含包含'word1'和'word3'的行只打印这两行之间的内容？我想打印整个区块。

#!/usr/bin/env python
import os, re
file = 'test.txt'
with open(file) as fp:
   for result in re.findall('word1(.*?)word3', fp.read(), re.S):
     print result

如果您有关如何改进上述代码的任何建议或建议，我将不胜感激。在大量文本文件上运行时的速度。谢谢。

Answer 1

使用glob.glob：

import os, re
import glob

pattern = re.compile('word1(.*?)word3', flags=re.S)
for file in glob.glob('*.txt'):
    with open(file) as fp:
        for result in pattern.findall(fp.read()):
            print result

Answer 2

受到 falsetru 答案的启发，我重写了我的代码，使其更通用。

现在要探索的文件：

可以以字符串描述为 第二个参数 ，glob()将使用 /> 如果无法使用glob ish模式描述所需文件集

，则专门针对此目标编写的或函数

如果未通过 第三个参数 ，则可能位于当前目录中， 或在指定目录中如果其路径作为第二个参数传递

import re,glob from itertools import ifilter from os import getcwd,listdir,path from inspect import isfunction regx = re.compile('^[^\n]*word1.*?word3.*?$',re.S|re.M) G = '\n\n'\ 'MWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMW\n'\ 'MWMWMW %s\n'\ 'MWMWMW %s\n'\ '%s%s' def search(REGX, how_to_find_files, dirpath='', G=G,sepm = '\n======================\n'): if dirpath=='': dirpath = getcwd() if isfunction(how_to_find_files): gen = ifilter(how_to_find_files, ifilter(path.isfile,listdir(dirpath))) elif isinstance(how_to_find_files,str): gen = glob.glob(path.join(dirpath, how_to_find_files)) for fn in gen: with open(fn) as fp: found = REGX.findall(fp.read()) if found: yield G % (dirpath,path.basename(fn), sepm,sepm.join(found)) # Example of searching in .txt files #============ one use =================== def select(fn): return fn[-4:]=='.txt' print ''.join(search(regx, select)) #============= another use ============== print ''.join(search(regx,'*.txt'))

通过连续生成器链接处理sevral文件的优点是最终加入''.join()会创建一个即时写入的唯一字符串，
而如果没有这样处理，由于显示中断，一个接一个地打印几个单独的字符串会更长（我可以理解吗？）

如何在当前目录中的所有* .txt文件上运行脚本？

2 个答案: