将嵌套的缩进文本解析为列表
您好,
也许有人可以给我一个开始帮助。
我有嵌套缩进的txt,与此类似。我应该将其解析为嵌套列表结构,如
TXT = r"""
Test1
NeedHelp
GotStuck
Sometime
NoLuck
NeedHelp2
StillStuck
GoodLuck
"""
Nested_Lists = ['Test1',
['NeedHelp',
['GotStuck',
['Sometime',
'NoLuck']]],
['NeedHelp2',
['StillStuck',
'GoodLuck']]
]
Nested_Lists = ['Test1', ['NeedHelp', ['GotStuck', ['Sometime', 'NoLuck']]], ['NeedHelp2', ['StillStuck', 'GoodLuck']]]
任何有关python3的帮助都会被评估
答案 0 :(得分:7)
您可以利用Python tokenizer来解析缩进文本:
from tokenize import NAME, INDENT, DEDENT, tokenize
def parse(file):
stack = [[]]
lastindent = len(stack)
def push_new_list():
stack[-1].append([])
stack.append(stack[-1][-1])
return len(stack)
for t in tokenize(file.readline):
if t.type == NAME:
if lastindent != len(stack):
stack.pop()
lastindent = push_new_list()
stack[-1].append(t.string) # add to current list
elif t.type == INDENT:
lastindent = push_new_list()
elif t.type == DEDENT:
stack.pop()
return stack[-1]
示例:
from io import BytesIO
from pprint import pprint
pprint(parse(BytesIO(TXT.encode('utf-8'))), width=20)
['Test1',
['NeedHelp',
['GotStuck',
['Sometime',
'NoLuck']]],
['NeedHelp2',
['StillStuck',
'GoodLuck']]]
答案 1 :(得分:4)
我希望你能理解我的解决方案。如果没有,请问。
def nestedbyindent(string, indent_char=' '):
splitted, i = string.splitlines(), 0
def first_non_indent_char(string):
for i, c in enumerate(string):
if c != indent_char:
return i
return -1
def subgenerator(indent):
nonlocal i
while i < len(splitted):
s = splitted[i]
title = s.lstrip()
if not title:
i += 1
continue
curr_indent = first_non_indent_char(s)
if curr_indent < indent:
break
elif curr_indent == indent:
i += 1
yield title
else:
yield list(subgenerator(curr_indent))
return list(subgenerator(-1))
>>> nestedbyindent(TXT)
['Test1', ['NeedHelp', ['GotStuck', ['Sometime', 'NoLuck']],
'NeedHelp2',['StillStuck', 'GoodLuck']]]
答案 2 :(得分:0)
这是非Pythonic和冗长方式的答案。但似乎有效。
TXT = r"""
Test1
NeedHelp
GotStuck
Sometime
NoLuck
NeedHelp2
StillStuck
GoodLuck
"""
outString = '['
level = 0
first = 1
for i in TXT.split("\n")[1:]:
count = 0
for j in i:
if j!=' ':
break
count += 1
count /= 4 #4 space = 1 indent
if i.lstrip()!='':
itemStr = "'" + i.lstrip() + "'"
else:
itemStr = ''
if level < count:
if first:
outString += '['*(count - level) + itemStr
first = 0
else:
outString += ',' + '['*(count - level) + itemStr
elif level > count:
outString += ']'*(level - count) + ',' + itemStr
else:
if first:
outString += itemStr
first = False
else:
outString += ',' + itemStr
level = count
if len(outString)>1:
outString = outString[:-1] + ']'
else:
outString = '[]'
output = eval(outString)
#['Test1', ['NeedHelp', ['GotStuck', ['Sometime', 'NoLuck']], 'NeedHelp2', ['StillStuck', 'GoodLuck']]]
答案 3 :(得分:0)
如果{em> 行要保留,并且如果这些行包含的不仅仅是变量名,则t.type == NAME
可以用{{1 }},并且该if语句可以附加被删除的行而不是t.type == NEWLINE
。像这样:
t.string
否则,行将在任何标记上分割,其中标记包括空格,括号,方括号等。