Python REGEX和文件I / O.

时间:2016-01-24 17:01:52

标签: python regex

仍在学习python,对REGEX来说非常新。我试图从文本文件中获取信息并将其放入列表中以供以后处理:

以下是一个示例python文件:

import re

text = '''name = file details
version = v1.2
;----------------
; Notes on line one
; Notes on line two
;
; Notes on line four, skipping line 3
;--------------
configuring this device
configuring that device
; I don't want this note'''



def notes(path):
    file = re.split('\n+', path)
    outputName = outputVer = outputNote = ''
    notes = []
    outputNotes = []
    for line in file:
        name = re.search('^name = (.*)$', line)
        ver = re.search('^version = (.*)$', line)
        note = re.search('; (.*)', line)
        if name:
            outputName = name.group(1)
        if ver:
            outputVer  = ver.group(1)
        notes.append(note)
    for note in notes:
        print(note)



    info = (outputName, outputVer, outputNotes)
    print(info[2])

    for notes in info[2]:
        if notes:
            print(notes)

    print(info)


notes(text)

我想要的是获取"名称","版本"和"笔记"

我可以获得没有问题的名称和版本,这些注释是我遇到的问题。对于笔记,我希望所有东西都在; ---------标记之间。我不想要文件后面的注释。

基本上,我希望输出看起来像:

('file details', 'v1.2', ['Notes on line one', 'Notes on line two', '','Notes on line four, skipping line 3'])

另外,我确定有办法优化这一点,我有兴趣听取建议。

3 个答案:

答案 0 :(得分:0)

如果我理解您的问题陈述,您只是在文件顶部读取不同数量的行。完全没有理由使用正则表达式 - 只需读取2行代码的名称和版本,然后读取标题起始行(&#39 ;; ---')然后循环,将行读入数组,直到你看到标题结束行(&#39 ;; ---')。

答案 1 :(得分:0)

使用MULTILINEDOTALL模式:

(?:^;-+$)(.*?)(?:^;-+$)

查看demo on regex101.com
或者在这里作为完整的演练:

import re

text = _your_string_

def notes():
    lines = re.split('\n', text)
    for line in lines:
        if line.startswith('name'):
            name = re.search(r"^name = (.*)", line)
            if (name):
                outputName = name.group(1)
        elif line.startswith('version'):
            version = re.search(r"^version = (.*)", line)
            if (version):
                outputVer = version.group(1)

    # now the notes part
    notes = re.search(r"(?:^;-+$)(.*?)(?:^;-+$)", text, re.MULTILINE|re.DOTALL)
    outputNotes = [x.strip() for x in re.split(r'\n;?', notes.group(1)) if x]
    info = [outputName, outputVer, outputNotes]
    return info

info = notes()
print info
# ['file details', 'v1.2', ['Notes on line one', 'Notes on line two', 'Notes on line four, skipping line 3']]

答案 2 :(得分:0)

这需要多种方法的混合,如下所示 - 我使用named-capture-group,首先提取notes我应用正则表达式两次以选择;-----内的文本,并且该行内部没有文字只是;

import re

txt = '''name = file details
version = v1.2
;----------------
; Notes on line one
; Notes on line two
;
; Notes on line four, skipping line 3
;--------------
configuring this device
configuring that device
; I don't want this note'''
data = re.search(r'name\s*=\s*(?P<name>.*)\W*version\s*=\s*(?P<version>.*)\W*(?:;-+\W)(?P<notes>[\w\W]*)(?:;-+\W)',txt)
print data.group('name')#prints name
print data.group('version')#prints version
#print data.group('notes')
print [i.strip(';') for i in re.findall(r';\s*[^;]{2,}',data.group('notes'))]#prints notes

输出 -

file details
v1.2
[' Notes on line one\n', ' Notes on line two\n', ' Notes on line four, skipping line 3\n']

HERE

查看第一个正则表达式的详细信息