我需要从嵌套括号中提取字符串,如下所示:
[ this is [ hello [ who ] [what ] from the other side ] slim shady ]
结果(订单无关紧要):
This is slim shady
Hello from the other side
Who
What
注意,字符串可以有N个括号,它们始终有效,但可能嵌套也可能不嵌套。此外,字符串不必以括号开头。
我在网上发现的类似问题的解决方案表明正则表达式,但我不确定它是否适用于这种情况。
我正在考虑实现这类似于我们如何检查字符串是否具有所有有效括号:
穿过绳子。如果我们看到[我们在堆栈上推送它的索引,如果我们看到一个],我们从那里子串到当前点。
但是,我们需要从原始字符串中删除该子字符串,因此我们不会将其作为任何输出的一部分。所以,我没有推动只是将索引推入堆栈,而是在考虑创建一个LinkedList,当我们找到[我们在LinkedList上插入该Node时]。这将允许我们从LinkedList中轻松删除子字符串。
这是一个好方法还是有一个更清洁,已知的解决方案?
编辑:
'[ this is [ hello [ who ] [what ] from the other [side] ] slim shady ][oh my [g[a[w[d]]]]]'
应该返回(订单无关紧要):
this is slim shady
hello from the other
who
what
side
oh my
g
a
w
d
白色空间并不重要,事后可以轻易删除。重要的是能够区分括号内的不同内容。通过在新行中分隔它们,或者有一个字符串列表。
答案 0 :(得分:5)
使用正则表达式可以很容易地解决这个问题:
import re
s= '[ this is [ hello [ who ] [what ] from the other [side] ] slim shady ][oh my [g[a[w[d]]]]]'
result= []
pattern= r'\[([^[\]]*)\]' #regex pattern to find non-nested square brackets
while '[' in s: #while brackets remain
result.extend(re.findall(pattern, s)) #find them all and add them to the list
s= re.sub(pattern, '', s) #then remove them
result= filter(None, (t.strip() for t in result)) #strip whitespace and drop empty strings
#result: ['who', 'what', 'side', 'd', 'hello from the other', 'w', 'this is slim shady', 'a', 'g', 'oh my']
答案 1 :(得分:5)
此代码按字符扫描文本,并为每个打开list
将空[
推送到堆栈,并在每次关闭时从堆栈中弹出最后一个list
{{1} }}。
]
输出;
text = '[ this is [ hello [ who ] [what ] from the other side ] slim shady ]'
def parse(text):
stack = []
for char in text:
if char == '[':
#stack push
stack.append([])
elif char == ']':
yield ''.join(stack.pop())
else:
#stack peek
stack[-1].append(char)
print(tuple(parse(text)))
答案 2 :(得分:1)
您可以使用树状结构来表示您的比赛。
class BracketMatch:
def __init__(self, refstr, parent=None, start=-1, end=-1):
self.parent = parent
self.start = start
self.end = end
self.refstr = refstr
self.nested_matches = []
def __str__(self):
cur_index = self.start+1
result = ""
if self.start == -1 or self.end == -1:
return ""
for child_match in self.nested_matches:
if child_match.start != -1 and child_match.end != -1:
result += self.refstr[cur_index:child_match.start]
cur_index = child_match.end + 1
else:
continue
result += self.refstr[cur_index:self.end]
return result
# Main script
haystack = '''[ this is [ hello [ who ] [what ] from the other side ] slim shady ]'''
root = BracketMatch(haystack)
cur_match = root
for i in range(len(haystack)):
if '[' == haystack[i]:
new_match = BracketMatch(haystack, cur_match, i)
cur_match.nested_matches.append(new_match)
cur_match = new_match
elif ']' == haystack[i]:
cur_match.end = i
cur_match = cur_match.parent
else:
continue
# Here we built the set of matches, now we must print them
nodes_list = root.nested_matches
# So we conduct a BFS to visit and print each match...
while nodes_list != []:
node = nodes_list.pop(0)
nodes_list.extend(node.nested_matches)
print("Match: " + str(node).strip())
该计划的输出将是:
匹配:这是苗条的阴影 匹配:你好,从另一边来 匹配:谁 匹配:什么
答案 3 :(得分:1)
a = '[ this is [ hello [ who ] [what ] from the other side ] slim shady ]'
lvl = -1
words = []
for i in a:
if i == '[' :
lvl += 1
words.append('')
elif i == ']' :
lvl -= 1
else:
words[lvl] += i
for word in words:
print ' '.join(word.split())
这给出了o / p -
这是苗条的阴暗
你好,来自另一边
谁是什么