Question

仍在学习python，对REGEX来说非常新。我试图从文本文件中获取信息并将其放入列表中以供以后处理：

以下是一个示例python文件：

import re

text = '''name = file details
version = v1.2
;----------------
; Notes on line one
; Notes on line two
;
; Notes on line four, skipping line 3
;--------------
configuring this device
configuring that device
; I don't want this note'''



def notes(path):
    file = re.split('\n+', path)
    outputName = outputVer = outputNote = ''
    notes = []
    outputNotes = []
    for line in file:
        name = re.search('^name = (.*)$', line)
        ver = re.search('^version = (.*)$', line)
        note = re.search('; (.*)', line)
        if name:
            outputName = name.group(1)
        if ver:
            outputVer  = ver.group(1)
        notes.append(note)
    for note in notes:
        print(note)



    info = (outputName, outputVer, outputNotes)
    print(info[2])

    for notes in info[2]:
        if notes:
            print(notes)

    print(info)


notes(text)

我想要的是获取＆＃34;名称＆＃34;，＆＃34;版本＆＃34;和＆＃34;笔记＆＃34;

我可以获得没有问题的名称和版本，这些注释是我遇到的问题。对于笔记，我希望所有东西都在; ---------标记之间。我不想要文件后面的注释。

基本上，我希望输出看起来像：

('file details', 'v1.2', ['Notes on line one', 'Notes on line two', '','Notes on line four, skipping line 3'])

另外，我确定有办法优化这一点，我有兴趣听取建议。

Answer 1

如果我理解您的问题陈述，您只是在文件顶部读取不同数量的行。完全没有理由使用正则表达式 - 只需读取2行代码的名称和版本，然后读取标题起始行（＆＃39 ;; ---＆＃39;）然后循环，将行读入数组，直到你看到标题结束行（＆＃39 ;; ---＆＃39;）。

Answer 2

使用MULTILINE和DOTALL模式：

(?:^;-+$)(.*?)(?:^;-+$)

查看demo on regex101.com
或者在这里作为完整的演练：

import re

text = _your_string_

def notes():
    lines = re.split('\n', text)
    for line in lines:
        if line.startswith('name'):
            name = re.search(r"^name = (.*)", line)
            if (name):
                outputName = name.group(1)
        elif line.startswith('version'):
            version = re.search(r"^version = (.*)", line)
            if (version):
                outputVer = version.group(1)

    # now the notes part
    notes = re.search(r"(?:^;-+$)(.*?)(?:^;-+$)", text, re.MULTILINE|re.DOTALL)
    outputNotes = [x.strip() for x in re.split(r'\n;?', notes.group(1)) if x]
    info = [outputName, outputVer, outputNotes]
    return info

info = notes()
print info
# ['file details', 'v1.2', ['Notes on line one', 'Notes on line two', 'Notes on line four, skipping line 3']]

Answer 3

这需要多种方法的混合，如下所示 - 我使用named-capture-group，首先提取notes我应用正则表达式两次以选择;-----内的文本，并且该行内部没有文字只是;。

import re

txt = '''name = file details
version = v1.2
;----------------
; Notes on line one
; Notes on line two
;
; Notes on line four, skipping line 3
;--------------
configuring this device
configuring that device
; I don't want this note'''
data = re.search(r'name\s*=\s*(?P<name>.*)\W*version\s*=\s*(?P<version>.*)\W*(?:;-+\W)(?P<notes>[\w\W]*)(?:;-+\W)',txt)
print data.group('name')#prints name
print data.group('version')#prints version
#print data.group('notes')
print [i.strip(';') for i in re.findall(r';\s*[^;]{2,}',data.group('notes'))]#prints notes

输出 -

file details
v1.2
[' Notes on line one\n', ' Notes on line two\n', ' Notes on line four, skipping line 3\n']

在 HERE

查看第一个正则表达式的详细信息

Python REGEX和文件I / O.

3 个答案: