因此,我试图输出目录中包含多个正则表达式中的任何一个的所有文本文件。
这是一个示例正则表达式,用于搜索文件中的电话号码
#Search for Phone Numbers
regex2 =r'\d\d\d[-]\d\d\d[-]\d\d\d\d'
这是我获取所有文件的代码,但是对于将正则表达式放在何处感到困惑。
import glob
folder_path = "C:\Temp"
file_pattern = "\*.txt"
search_string = "hello"
match_list = []
folder_contents = glob.glob(folder_path + file_pattern)
for file in folder_contents:
print("Checking", file)
read_file = open(file, 'rt').read()
if search_string in read_file:
match_list.append(file)
print("Files containing search string")
for file in match_list:
print(file)
这是编译我目录中所有txt文件的另一种方法:
import glob
import errno
path = '/home//*.txt' #note C:
files = glob.glob(path)
for name in files:
with open(name) as f:
for line in f:
split = line.split()
if split:
print(line.split())
我试图将我的正则表达式放在上述每个if语句中,但给我错误。有什么想法吗?
答案 0 :(得分:0)
import re
# Define your regex
regex2 = re.compile(r'\d\d\d[-]\d\d\d[-]\d\d\d\d')
# Read files...
# Check if we have matches in the file content
matches = regex2.findall(read_file)
if matches:
match_list.append(file)
print('file:', file)
print('matches:', matches)