我有这样的列表:
boo = ['<a>', '<b>', '<c>', '</c>', '</b>', '</a>']
我试图遍历它们并找到匹配的索引,例如'<c>'
,'</c>'
并删除那些片段。它们必须彼此相邻并匹配才能被删除。删除索引后,将再次遍历列表,并保持删除状态,直到列表为空或不再可用为止。
我在想类似的东西:
for i in range(len(boo)):
for b in boo:
if boo[i]== '</'+ b +'>' and boo[i-1] == '<' + b +'>':
boo.remove(boo[i])
boo.remove(boo[i-1])
print(boo)
但这似乎没有任何作用。有人可以指出我的问题吗?
编辑
我将其更改为更像这样,但这是说我没有定义。我如何定义我呢?
def valid_html1(test_strings):
valid = []
for h in test_strings:
boo = re.findall('\W+\w+\W', h)
while i in boo == boo[i]:
if boo[i][1:] == boo[i+1][2:]:
boo.remove(boo[i])
boo.remove(boo[i+1])
print(boo)
valid_html1(example_set)
答案 0 :(得分:1)
在进行比较之前,应该分析字符串以从尖括号中提取标签名称。您可以使用zip
来配对相邻标签,并仅在其相邻项目名称不相同时才将项目追加到新列表中:
boo = ['<a>', '<b>', '<c>', '</c>', '</b>', '</a>']
while True:
pairs = zip(boo, boo[1:] + [''])
new_boo = []
for a, b in pairs:
if a.startswith('<') and a.endswith('>') and \
b.startswith('</') and b.endswith('>') and a[1:-1] == b[2:-1]:
next(pairs)
boo = new_boo
boo.extend(a for a, _ in pairs)
break
new_boo.append(a)
else:
break
print(boo)
这将输出:
[]
如果是boo = ['<a>', '<b>', '<c>', '</c>', '</b>', '</a>', '<d>']
,则输出:
['<d>']
答案 1 :(得分:0)
在99%的情况下,您不应在迭代时编辑列表。
此解决方案进行复制,然后编辑原始列表:
boo_copy = boo[:]
for i, b in enumerate(boo_copy)
if i == 0:
continue
stripped_tag = b.replace("</","").replace(">","").replace("<","") # Removes first and last char to remove '<' and '>'
if boo[i]== '</'+ stripped_tag +'>' and boo[i-1] == '<' + stripped_tag +'>':
boo.remove(boo[i])
boo.remove(boo[i-1])
print(boo)
这假定标签在列表中是唯一的。
答案 2 :(得分:0)
import re
def open_tag_as_str(tag):
m = re.match(r'^<(\w+)>$', tag)
return None if m is None else m.group(1)
def close_tag_as_str(tag):
m = re.match(r'^</(\w+)>$', tag)
return None if m is None else m.group(1)
def remove_adjacent_tags(tags):
def closes(a, b):
a = open_tag_as_str(a)
b = close_tag_as_str(b)
return a is not None and b is not None and a == b
# This is a bit ugly and could probably be improved with
# some itertools magic or something
skip = False
for i in range(len(tags)):
if skip:
skip = False
elif i + 1 < len(tags) and closes(tags[i], tags[i + 1]):
skip = True
else:
yield tags[i]
boo = ['<a>', '<b>', '<c>', '</c>', '</b>', '</a>']
boo = list(remove_adjacent_tags(boo))
print(boo)
礼物:
['<a>', '<b>', '</b>', '</a>']