我的目录:
path = "C:\Users\\alopes\\afolder\\"
以.proc
结尾的文件 infile = glob.glob(os.path.join(path, '*.proc'))
更新的代码:
import re
import os
import glob
import numpy as np
from itertools import dropwhile
pklist = []
#regex for packets
regTel = re.compile(r'[A-Z_]+[.][A-Z0-9_]+')
path = "C:\Users\\alopes\\afolder\procs\\"
infile = glob.glob(os.path.join(path, '*.proc'))
for j in infile:
with open(j, "r") as fobj:
dp = dropwhile(lambda x: x.startswith(";(C)"), fobj)
regTel = re.compile(r'[A-Z_]+[.][A-Z0-9_]+')
for line in dp:
m = regTel.search(line)
if m:
print(m.group())
我试过这个把m放到另一个列表中。目标是将每个文件的所有匹配项放入将在别处使用的列表
for n in m:
pklist.append(n)
答案 0 :(得分:0)
您可以使用;(C)
跳过以itertools.dropwhile
开头的行,然后搜索每一行:
from itertools import dropwhile
infile = "C:\Users\\alopes\\afolder\doc_name.ext"
with open(infile) as f:
regTel = re.compile(r'[A-Z_]+[.][A-Z0-9_]+')
for line in dropwhile(lambda x: x.lstrip().startswith(";(C)"), f):
m = regTel.search(line)
if m:
print(m.group())
输出:
HELLO_WORLD.THIS_IS_1_TEST
如果你想针对多个文件运行它并获取所有行:
from itertools import dropwhile
def yield_matches(fles,ign):
regTel = re.compile(r'[A-Z_]+[.][A-Z0-9_]+')
for fl in fles:
with open(fl) as f:
for line in dropwhile(lambda x: x.lstrip().startswith(ign)), f):
m = regTel.search(line)
if m:
yield m.group()
如果注释可以出现在任何地方,只需在使用fileinput.input
进行迭代时使用str.startswith来读取每个文件:
import fileinput
def yield_matches(fles,ign):
regTel = re.compile(r'[A-Z_]+[.][A-Z0-9_]+')
for line in fileinput.input(fles):
if not line.lstrip().startswith(ign):
m = regTel.search(line)
if m:
yield m.group()
只需调用传递文件名列表的函数,然后将字符串传递给startswith。
l = some_list_of_files
for i in yield_matches(l, ";(C)"):
print(i)
答案 1 :(得分:0)
您可以使用正则表达式:
re.compile(r'^(?!\s*;[(]C[)]).*?([A-Z_]+[.][A-Z0-9_]+)', re.MULTILINE)
^
锚定到行首(?!\s*;[(]C[)])
是一个负面的预测:“没有后跟;(C)”.*?
消耗剩余的字符直到([A-Z_]+[.][A-Z0-9_]+)
,括在括号中以创建一个组并使findall()