你好伙计们,在通过使用Beautiful Soup刮取一些数据之后...... 我想格式化这些数据,以便我可以轻松地将其导出为CSV和JSON。
我的问题这里是如何翻译:
Heading :
Subheading :
AnotherHeading :
AnotherSubheading :
Somedata
Heading :
Subheading :
AnotherHeading :
AnotherSubheading :
Somedata
进入:
[
['Heading',['Subheading']],
['AnotherHeading',['AnotherSubheading',['Somedata']]],
['Heading',['Subheading']],
['AnotherHeading',['AnotherSubheading',['Somedata']]]
]
为了清晰而缩进
任何救援尝试都会受到温暖的谢谢的赞赏!
到目前为止,我们得到了帮助:
def parse(data):
stack = [[]]
levels = [0]
current = stack[0]
for line in data.splitlines():
indent = len(line)-len(line.lstrip())
if indent > levels[-1]:
levels.append(indent)
stack.append([])
current.append(stack[-1])
current = stack[-1]
elif indent < levels[-1]:
stack.pop()
current = stack[-1]
levels.pop()
current.append(line.strip().rstrip(':'))
return stack
该代码的问题在于它返回...
[
'Heading ',
['Subheading '],
'AnotherHeading ',
['AnotherSubheading ', ['Somedata'], 'Heading ', 'Subheading '], 'AnotherHeading ',
['AnotherSubheading ', ['Somedata']]
]
这是一个代表: https://repl.it/yvM/1
答案 0 :(得分:1)
Thank you both kirbyfan64sos and SuperBiasedMan
def parse(data):
currentTab = 0
currentList = []
result = [currentList]
i = 0
tabCount = 0
for line in data.splitlines():
tabCount = len(line)-len(line.lstrip())
line = line.strip().rstrip(' :')
if tabCount == currentTab:
currentList.append(line)
elif tabCount > currentTab:
newList = [line]
currentList.append(newList)
currentList = newList
elif tabCount == 0:
currentList = [line]
result.append(currentList)
elif tabCount == 1:
currentList = [line]
result[-1].append(currentList)
currentTab = tabCount
tabCount = tabCount + 1
i = i + 1
print(result)
答案 1 :(得分:0)
Well first you want to clear out unnecessary whitespace, so you make a list of all the lines that contain something more than whitespace and set up all the defaults that you start from for the main loop.
teststring = [line for line in teststring.split('\n') if line.strip()]
currentTab = 0
currentList = []
result = [currentList]
This method replies on the mutability of lists, so setting currentList
as an empty list and then setting result
to [currentList]
is an important step, since we can now append to currentList
.
for line in teststring:
i, tabCount = 0, 0
while line[i] == ' ':
tabCount += 1
i += 1
tabCount /= 8
This is the best way I could think of to check for tab characters at the start of each line. Also, yes you'll notice I actually checked for spaces, not tabs. Tabs just 100% didn't work, I think it was because I was using repl.it since I don't have Python 3 installed. It works perfectly fine on Python 2.7 but I wont put code I haven't verified works. I can edit this if you confirm that using \t
and removing tabCount /= 8
produces the desired results.
Now, check how indented the line is. If it's the same as our currentTab
value, then just append to the currentList
.
if tabCount == currentTab:
currentList.append(line.strip())
If it's higher, that means we've gone to a deeper list level. We need a new list nested in currentList
.
elif tabCount > currentTab:
newList = [line.strip()]
currentList.append(newList)
currentList = newList
Going backwards is trickier, since the data only contains 3 nesting levels I opted to hardcode what to do with the values 0 and 1 (2 should always result in one of the above blocks). If there are no tabs, we can append a new list to result
.
elif tabCount == 0:
currentList = [line.strip()]
result.append(currentList)
It's mostly the same for a one tab deep heading, except that you should append to result[-1]
, as that's the last main heading to nest into.
elif tabCount == 1:
currentList = [line.strip()]
result[-1].append(currentList)
Lastly, make sure currentTab
is updated to what our current tabCount
is so the next iteration behaves properly.
currentTab = tabCount
答案 2 :(得分:-2)
类似的东西:
def parse(data):
stack = [[]]
levels = [0]
current = stack[0]
for line in data.splitlines():
indent = len(line)-len(line.lstrip())
if indent > levels[-1]:
levels.append(indent)
stack.append([])
current.append(stack[-1])
current = stack[-1]
elif indent < levels[-1]:
stack.pop()
current = stack[-1]
levels.pop()
current.append(line.strip().rstrip(':'))
return stack[0]
你的格式看起来很像YAML;你可能想看看PyYAML。