将嵌套的缩进文本解析为列表

时间:2014-03-21 03:16:02

标签: parsing python-3.x text-indent

将嵌套的缩进文本解析为列表

您好,

也许有人可以给我一个开始帮助。

我有嵌套缩进的txt,与此类似。我应该将其解析为嵌套列表结构,如

TXT = r"""
Test1
    NeedHelp
        GotStuck
            Sometime
            NoLuck
    NeedHelp2
        StillStuck
        GoodLuck
"""

Nested_Lists = ['Test1', 
    ['NeedHelp', 
        ['GotStuck', 
            ['Sometime', 
            'NoLuck']]], 
    ['NeedHelp2', 
        ['StillStuck', 
        'GoodLuck']]
]

Nested_Lists = ['Test1', ['NeedHelp', ['GotStuck', ['Sometime', 'NoLuck']]], ['NeedHelp2', ['StillStuck', 'GoodLuck']]]

任何有关python3的帮助都会被评估

4 个答案:

答案 0 :(得分:7)

您可以利用Python tokenizer来解析缩进文本:

from tokenize import NAME, INDENT, DEDENT, tokenize

def parse(file):
    stack = [[]]
    lastindent = len(stack)

    def push_new_list():
        stack[-1].append([])
        stack.append(stack[-1][-1])
        return len(stack)

    for t in tokenize(file.readline):
        if t.type == NAME:
            if lastindent != len(stack):
                stack.pop()
                lastindent = push_new_list()
            stack[-1].append(t.string) # add to current list
        elif t.type == INDENT:
            lastindent = push_new_list()
        elif t.type == DEDENT:
            stack.pop()
    return stack[-1]

示例:

from io import BytesIO
from pprint import pprint
pprint(parse(BytesIO(TXT.encode('utf-8'))), width=20)

输出

['Test1',
 ['NeedHelp',
  ['GotStuck',
   ['Sometime',
    'NoLuck']]],
 ['NeedHelp2',
  ['StillStuck',
   'GoodLuck']]]

答案 1 :(得分:4)

我希望你能理解我的解决方案。如果没有,请问。

def nestedbyindent(string, indent_char=' '):
    splitted, i = string.splitlines(), 0
    def first_non_indent_char(string):
        for i, c in enumerate(string):
            if c != indent_char:
                return i
        return -1
    def subgenerator(indent):
        nonlocal i
        while i < len(splitted):
            s = splitted[i]
            title = s.lstrip()
            if not title:
                i += 1
                continue
            curr_indent = first_non_indent_char(s)
            if curr_indent < indent:
                break
            elif curr_indent == indent:
                i += 1
                yield title
            else:
                yield list(subgenerator(curr_indent))
    return list(subgenerator(-1))

>>> nestedbyindent(TXT)
['Test1', ['NeedHelp', ['GotStuck', ['Sometime', 'NoLuck']],
'NeedHelp2',['StillStuck', 'GoodLuck']]]

答案 2 :(得分:0)

这是非Pythonic和冗长方式的答案。但似乎有效。

TXT = r"""
Test1
    NeedHelp
        GotStuck
            Sometime
            NoLuck
    NeedHelp2
        StillStuck
        GoodLuck
"""

outString = '['
level = 0
first = 1
for i in TXT.split("\n")[1:]:
    count = 0
    for j in i:
        if j!=' ':
            break
        count += 1
    count /= 4 #4 space = 1 indent
    if i.lstrip()!='':
        itemStr = "'" + i.lstrip() + "'"
    else:
        itemStr = ''
    if level < count:
        if first:
            outString += '['*(count - level) + itemStr
            first = 0
        else:
            outString += ',' + '['*(count - level) + itemStr
    elif level > count:
        outString += ']'*(level - count) + ',' + itemStr
    else:
        if first:
            outString += itemStr
            first = False
        else:
            outString += ',' + itemStr
    level = count
if len(outString)>1:
    outString = outString[:-1] + ']'
else:
    outString = '[]'

output = eval(outString)
#['Test1', ['NeedHelp', ['GotStuck', ['Sometime', 'NoLuck']], 'NeedHelp2', ['StillStuck', 'GoodLuck']]]

答案 3 :(得分:0)

如果{em> 行要保留,并且如果这些行包含的不仅仅是变量名,则t.type == NAME可以用{{1 }},并且该if语句可以附加被删除的行而不是t.type == NEWLINE。像这样:

t.string

否则,行将在任何标记上分割,其中标记包括空格,括号,方括号等。