如何使用Python将格式化文件解析为变量?

时间:2013-06-29 08:57:05

标签: python parsing

我有一个预先格式化的文本文件,其中包含一些变量,如下所示:

header one
   name = "this is my name"
   last_name = "this is my last name"
   addr = "somewhere"
   addr_no = 35
header
header two
   first_var = 1.002E-3
   second_var = -2.002E-8
header 

如您所见,每个分数都以字符串header开头,后跟范围名称(一,二等)。

我无法弄清楚如何使用Python以编程方式解析这些选项,以便以这种方式访问​​我的脚本:

one.name = "this is my name"
one.last_name = "this is my last name"
two.first_var = 1.002E-3

有人能指点我的教程,图书馆或文档的特定部分,以帮助我实现目标吗?

3 个答案:

答案 0 :(得分:4)

我用生成器解析它,在解析文件时产生部分。 ast.literal_eval()负责将值解释为Python文字:

import ast

def load_sections(filename):
    with open(filename, 'r') as infile:
        for line in infile:
            if not line.startswith('header'):
                continue  # skip to the next line until we find a header

            sectionname = line.split(None, 1)[-1].strip()
            section = {}
            for line in infile:
                if line.startswith('header'):
                    break  # end of section
                line = line.strip()               
                key, value = line.split(' = ', 1)
                section[key] = ast.literal_eval(value)

            yield sectionname, section

循环上面的函数以接收(name, section_dict)元组:

for name, section in load_sections(somefilename):
    print name, section

对于您的样本输入数据,结果为:

>>> for name, section in load_sections('/tmp/example'):
...     print name, section
... 
one {'last_name': 'this is my last name', 'name': 'this is my name', 'addr_no': 35, 'addr': 'somewhere'}
two {'first_var': 0.001002, 'second_var': -2.002e-08}

答案 1 :(得分:2)

Martijn Pieters在给出预格式化文件的答案中是正确的,但如果您可以首先以不同的方式格式化文件,则可以避免很多潜在的错误。如果我是你,我会考虑将文件格式化为JSON(或XML),因为这样你就可以使用python的json(或XML)库来为你完成工作。 http://docs.python.org/2/library/json.html。除非您正在使用非常糟糕的遗留代码或您无法访问的系统,否则您应该能够首先进入吐出文件的代码并使其为您提供更好的文件。

答案 2 :(得分:1)

def get_section(f):
    section=[]
    for line in f:
        section += [ line.strip("\n ") ]
        if section[-1] == 'header': break
    return section

sections = dict()
with open('input') as f:
    while True:
        section = get_section(f)
        if not section: break
        section_dict = dict()
        section_dict['sname'] = section[0].split()[1]
        for param in section[1:-2]:
            k,v = [ x.strip() for x in param.split('=')]
            section_dict[k] = v
        sections[section_dict['sname']] = section_dict

print sections['one']['name']

您还可以将这些部分作为属性访问:

class Section:
    def __init__(self, d):
        self.__dict__ = d

one = Section(sections['one'])
print one.name