我有一个会计树,它在源代码中存有缩进/空格:
Income
Revenue
IAP
Ads
Other-Income
Expenses
Developers
In-house
Contractors
Advertising
Other Expenses
有一定数量的级别,所以我想通过使用3个字段来平整层次结构(实际数据有6个级别,例如简化):
L1 L2 L3
Income
Income Revenue
Income Revenue IAP
Income Revenue Ads
Income Other-Income
Expenses Developers In-house
... etc
我可以通过检查帐户名称之前的空格数来执行此操作:
for rownum in range(6,ws.max_row+1):
accountName = str(ws.cell(row=rownum,column=1).value)
indent = len(accountName) - len(accountName.lstrip(' '))
if indent == 0:
l1 = accountName
l2 = ''
l3 = ''
elif indent == 3:
l2 = accountName
l3 = ''
else:
l3 = accountName
w.writerow([l1,l2,l3])
是否有更灵活的方法来实现这一点,基于当前行与前一行相比的缩进而不是假设每个级别总是3个空格? L1
将始终没有缩进,我们可以相信较低级别将比其父级缩进,但每个级别可能不总是3个空格。
更新,最后将其作为逻辑的核心,因为我最终想要带有内容的帐户列表,使用缩进来决定是否重置,追加或弹出列表似乎最简单:
if indent == 0:
accountList = []
accountList.append((indent,accountName))
elif indent > prev_indent:
accountList.append((indent,accountName))
elif indent <= prev_indent:
max_indent = int(max(accountList,key=itemgetter(0))[0])
while max_indent >= indent:
accountList.pop()
max_indent = int(max(accountList,key=itemgetter(0))[0])
accountList.append((indent,accountName))
因此,在每行输出中,accountList都已完成。
答案 0 :(得分:5)
你可以模仿Python实际解析缩进的方式。 首先,创建一个包含缩进级别的堆栈。 在每一行:
indentation = []
indentation.append(0)
depth = 0
f = open("test.txt", 'r')
for line in f:
line = line[:-1]
content = line.strip()
indent = len(line) - len(content)
if indent > indentation[-1]:
depth += 1
indentation.append(indent)
elif indent < indentation[-1]:
while indent < indentation[-1]:
depth -= 1
indentation.pop()
if indent != indentation[-1]:
raise RuntimeError("Bad formatting")
print(f"{content} (depth: {depth})")
使用&#34; test.txt&#34;文件的内容与您提供的一致:
Income
Revenue
IAP
Ads
Other-Income
Expenses
Developers
In-house
Contractors
Advertising
Other Expenses
这是输出:
Income (depth: 0)
Revenue (depth: 1)
IAP (depth: 2)
Ads (depth: 2)
Other-Income (depth: 1)
Expenses (depth: 0)
Developers (depth: 1)
In-house (depth: 2)
Contractors (depth: 2)
Advertising (depth: 1)
Other Expense (depth: 1)
那么,你能做些什么呢? 假设您要构建嵌套列表。 首先,创建一个数据堆栈。
无论如何,对于每一行,将内容附加到数据堆栈顶部的列表中。
以下是相应的实现:
for line in f:
line = line[:-1]
content = line.strip()
indent = len(line) - len(content)
if indent > indentation[-1]:
depth += 1
indentation.append(indent)
data.append([])
elif indent < indentation[-1]:
while indent < indentation[-1]:
depth -= 1
indentation.pop()
top = data.pop()
data[-1].append(top)
if indent != indentation[-1]:
raise RuntimeError("Bad formatting")
data[-1].append(content)
while len(data) > 1:
top = data.pop()
data[-1].append(top)
您的嵌套列表位于data
堆栈的顶部。
同一文件的输出是:
['Income',
['Revenue',
['IAP',
'Ads'
],
'Other-Income'
],
'Expenses',
['Developers',
['In-house',
'Contractors'
],
'Advertising',
'Other Expense'
]
]
这很容易操作,虽然嵌套很深。 您可以通过链接项目访问来访问数据:
>>> l = data[0]
>>> l
['Income', ['Revenue', ['IAP', 'Ads'], 'Other-Income'], 'Expenses', ['Developers', ['In-house', 'Contractors'], 'Advertising', 'Other Expense']]
>>> l[1]
['Revenue', ['IAP', 'Ads'], 'Other-Income']
>>> l[1][1]
['IAP', 'Ads']
>>> l[1][1][0]
'IAP'
答案 1 :(得分:2)
如果缩进是固定数量的空格(此处为3个空格),则可以简化缩进级别的计算。
注意:我使用StringIO来模拟文件
import io
import itertools
content = u"""\
Income
Revenue
IAP
Ads
Other-Income
Expenses
Developers
In-house
Contractors
Advertising
Other Expenses
"""
stack = []
for line in io.StringIO(content):
content = line.rstrip() # drop \n
row = content.split(" ")
stack[:] = stack[:len(row) - 1] + [row[-1]]
print("\t".join(stack))
你得到:
Income
Income Revenue
Income Revenue IAP
Income Revenue Ads
Income Other-Income
Expenses
Expenses Developers
Expenses Developers In-house
Expenses Developers Contractors
Expenses Advertising
Expenses Other Expenses
编辑:缩进未修复
如果缩进没有修复(你并不总是有3个空格),如下例所示:
content = u"""\
Income
Revenue
IAP
Ads
Other-Income
Expenses
Developers
In-house
Contractors
Advertising
Other Expenses
"""
您需要估算每个新行的转移:
stack = []
last_indent = u""
for line in io.StringIO(content):
indent = "".join(itertools.takewhile(lambda c: c == " ", line))
shift = 0 if indent == last_indent else (-1 if len(indent) < len(last_indent) else 1)
index = len(stack) + shift
stack[:] = stack[:index - 1] + [line.strip()]
last_indent = indent
print("\t".join(stack))