在python

时间:2015-08-19 16:46:26

标签: python

我在python中有一个以下形式的字符串:

line a
line b
  line ba
  line bb
    line bba
  line bc
line c
  line ca
    line caa
line d

你可以得到这个想法。它实际上采用了与python代码本身非常相似的形式,因为有一条线,在该行下面,缩进表示块的一部分,由最近的较小缩进线引导。

我需要做的是将此代码解析为树结构,以便每个根级别行是字典的键,其值是表示所有子行的字典。所以上面会是:

{
'line a' => {},
'line b' => {
  'line ba' => {},
  'line bb' => {
    'line bba' => {}
    },
  'line bc' => {}
  },
'line c' => {
  'line ca' => {
    'line caa' => {}
    },
  },
'line d' => {}
}

这就是我所拥有的:

def parse_message_to_tree(message):
    buf = StringIO(message)
    return parse_message_to_tree_helper(buf, 0)

def parse_message_to_tree_helper(buf, prev):
    ret = {}
    for line in buf:
        line = line.rstrip()
        index = len(line) - len(line.lstrip())
        print (line + " => " + str(index))
        if index > prev:
            ret[line.strip()] = parse_message_to_tree_helper(buf, index)
        else:
            ret[line.strip()] = {}

    return ret

打印显示左侧剥离的行和索引为0.我不认为lstrip()是一个变异器,但无论哪种方式索引仍应准确。

任何建议都有帮助。

编辑:不确定之前出了什么问题,但我再次尝试,它更接近工作,但仍然不太正确。这就是我现在所拥有的:

{'line a': {},
 'line b': {},
 'line ba': {'line bb': {},
             'line bba': {'line bc': {},
                          'line c': {},
                          'line ca': {},
                          'line caa': {},
                          'line d': {}}}}

3 个答案:

答案 0 :(得分:3)

就像之前已经注意到str.lstrip()不是一个改变者一样,索引在我的系统中也是准确的。

但问题是,当你意识到该行的索引增加时,line实际上指向增加的索引行,例如,在第一种情况下,我们注意到行的索引在line baline指向line ba,然后在if条件下,您执行此操作 -

ret[line.strip()] = parse_message_to_tree_helper(buf, index)

这是错误的,因为您要将parse_message_to_tree_helper()返回的任何内容设置为line ba,而不是其实际的父级。

此外,一旦你在函数内部递归,除非文件已被完全读取,否则你不会出现,但是某个行存储在字典中的级别取决于当缩进减少时来自递归的级别。

我不确定,是否有任何内置库可以帮助您完成此操作,但是我能够提供的代码(基于您的代码很多) -

def parse_message_to_tree(message):
    buf = StringIO(message)
    return parse_message_to_tree_helper(buf, 0, None)[0]

def parse_message_to_tree_helper(buf, prev, prevline):
    ret = {}
    index = -1
    for line in buf:
        line = line.rstrip()
        index = len(line) - len(line.lstrip())
        print (line + " => " + str(index))
        if index > prev:
            ret[prevline.strip()],prevline,index = parse_message_to_tree_helper(buf, index, line)
            if index < prev:
                return ret,prevline,index
            continue
        elif not prevline:
            ret[line.strip()] = {}
        else:
            ret[prevline.strip()] = {}
        if index < prev:
            return ret,line,index
        prevline = line
    if index == -1:
        ret[prevline.strip()] = {}
        return ret,None,index
    if prev == index:
        ret[prevline.strip()] = {}
    return ret,None,0

示例/演示 -

>>> print(s)
line a
line b
  line ba
  line bb
    line bba
  line bc
line c
  line ca
    line caa
>>> def parse_message_to_tree(message):
...     buf = StringIO(message)
...     return parse_message_to_tree_helper(buf, 0, None)[0]
...
>>> def parse_message_to_tree_helper(buf, prev, prevline):
...     ret = {}
...     index = -1
...     for line in buf:
...         line = line.rstrip()
...         index = len(line) - len(line.lstrip())
...         print (line + " => " + str(index))
...         if index > prev:
...             ret[prevline.strip()],prevline,index = parse_message_to_tree_helper(buf, index, line)
...             if index < prev:
...                 return ret,prevline,index
...             continue
...         elif not prevline:
...             ret[line.strip()] = {}
...         else:
...             ret[prevline.strip()] = {}
...         if index < prev:
...             return ret,line,index
...         prevline = line
...     if index == -1:
...         ret[prevline.strip()] = {}
...         return ret,None,index
...     if prev == index:
...         ret[prevline.strip()] = {}
...     return ret,None,0
...
>>> pprint.pprint(parse_message_to_tree(s))
line a => 0
line b => 0
  line ba => 2
  line bb => 2
    line bba => 4
  line bc => 2
line c => 0
  line ca => 2
    line caa => 4
{'line a': {},
 'line b': {'line ba': {}, 'line bb': {'line bba': {}}, 'line bc': {}},
 'line c': {'line ca': {'line caa': {}}}}
>>> s = """line a
... line b
...   line ba
...   line bb
...     line bba
...   line bc
... line c
...   line ca
...     line caa
... line d"""
>>> pprint.pprint(parse_message_to_tree(s))
line a => 0
line b => 0
  line ba => 2
  line bb => 2
    line bba => 4
  line bc => 2
line c => 0
  line ca => 2
    line caa => 4
line d => 0
{'line a': {},
 'line b': {'line ba': {}, 'line bb': {'line bba': {}}, 'line bc': {}},
 'line c': {'line ca': {'line caa': {}}},
 'line d': {}}

您需要针对更多错误或某些错过的案例测试代码。

答案 1 :(得分:1)

lstrip()不是变异者,请参阅documentation

  

string.lstrip(s [,chars])

     

返回删除了前导字符的字符串副本。如果省略chars或None,则删除空格字符。如果给出   而不是没有,字符必须是一个字符串;字符串中的字符   将从该方法的字符串的开头剥离   呼吁。

您的代码似乎与我机器上的示例文本一起使用。

答案 2 :(得分:1)

另一个答案,使用堆栈而不是递归。它需要几次迭代才能达到这个版本,它似乎可以处理几种可能的输入方案,但不能保证完全没有错误!这确实是一个棘手的问题。希望我的评论能够说明正确的思路。感谢您分享问题。

text = '''line a
line b
  line ba
  line bb
    line bba
  line bc
line c
  line ca
    line caa
line d'''

root_tree = {}
stack = []
prev_indent, prev_tree = -1, root_tree

for line in text.splitlines():

    # compute current line's indent and strip the line
    origlen = len(line)
    line = line.lstrip()
    indent = origlen - len(line)
    print indent, line

    # no matter what, every line has its own tree, so let's create it.
    tree = {}  

    # where to attach this new tree is dependent on indent, prev_indent
    # assume: stack[-1] was the right attach point for the previous line
    # then: let's adjust the stack to make that true for the current line

    if indent < prev_indent:
        while stack[-1][0] >= indent:
            stack.pop()
    elif indent > prev_indent:
        stack.append((prev_indent, prev_tree))

    # at this point: stack[-1] is the right attach point for the current line
    parent_indent, parent_tree = stack[-1]
    assert parent_indent < indent

    # attach the current tree
    parent_tree[line] = tree

    # update state
    prev_indent, prev_tree = indent, tree

print len(stack)
print stack
print root_tree