仍在学习python,对REGEX来说非常新。我试图从文本文件中获取信息并将其放入列表中以供以后处理:
以下是一个示例python文件:
import re
text = '''name = file details
version = v1.2
;----------------
; Notes on line one
; Notes on line two
;
; Notes on line four, skipping line 3
;--------------
configuring this device
configuring that device
; I don't want this note'''
def notes(path):
file = re.split('\n+', path)
outputName = outputVer = outputNote = ''
notes = []
outputNotes = []
for line in file:
name = re.search('^name = (.*)$', line)
ver = re.search('^version = (.*)$', line)
note = re.search('; (.*)', line)
if name:
outputName = name.group(1)
if ver:
outputVer = ver.group(1)
notes.append(note)
for note in notes:
print(note)
info = (outputName, outputVer, outputNotes)
print(info[2])
for notes in info[2]:
if notes:
print(notes)
print(info)
notes(text)
我想要的是获取"名称","版本"和"笔记"
我可以获得没有问题的名称和版本,这些注释是我遇到的问题。对于笔记,我希望所有东西都在; ---------标记之间。我不想要文件后面的注释。
基本上,我希望输出看起来像:('file details', 'v1.2', ['Notes on line one', 'Notes on line two', '','Notes on line four, skipping line 3'])
另外,我确定有办法优化这一点,我有兴趣听取建议。
答案 0 :(得分:0)
如果我理解您的问题陈述,您只是在文件顶部读取不同数量的行。完全没有理由使用正则表达式 - 只需读取2行代码的名称和版本,然后读取标题起始行(&#39 ;; ---')然后循环,将行读入数组,直到你看到标题结束行(&#39 ;; ---')。
答案 1 :(得分:0)
(?:^;-+$)(.*?)(?:^;-+$)
查看demo on regex101.com
或者在这里作为完整的演练:
import re
text = _your_string_
def notes():
lines = re.split('\n', text)
for line in lines:
if line.startswith('name'):
name = re.search(r"^name = (.*)", line)
if (name):
outputName = name.group(1)
elif line.startswith('version'):
version = re.search(r"^version = (.*)", line)
if (version):
outputVer = version.group(1)
# now the notes part
notes = re.search(r"(?:^;-+$)(.*?)(?:^;-+$)", text, re.MULTILINE|re.DOTALL)
outputNotes = [x.strip() for x in re.split(r'\n;?', notes.group(1)) if x]
info = [outputName, outputVer, outputNotes]
return info
info = notes()
print info
# ['file details', 'v1.2', ['Notes on line one', 'Notes on line two', 'Notes on line four, skipping line 3']]
答案 2 :(得分:0)
这需要多种方法的混合,如下所示 - 我使用named-capture-group
,首先提取notes
我应用正则表达式两次以选择;-----
内的文本,并且该行内部没有文字只是;
。
import re
txt = '''name = file details
version = v1.2
;----------------
; Notes on line one
; Notes on line two
;
; Notes on line four, skipping line 3
;--------------
configuring this device
configuring that device
; I don't want this note'''
data = re.search(r'name\s*=\s*(?P<name>.*)\W*version\s*=\s*(?P<version>.*)\W*(?:;-+\W)(?P<notes>[\w\W]*)(?:;-+\W)',txt)
print data.group('name')#prints name
print data.group('version')#prints version
#print data.group('notes')
print [i.strip(';') for i in re.findall(r';\s*[^;]{2,}',data.group('notes'))]#prints notes
输出 -
file details
v1.2
[' Notes on line one\n', ' Notes on line two\n', ' Notes on line four, skipping line 3\n']
在 HERE
查看第一个正则表达式的详细信息