Question

我有这样的列表：

boo = ['<a>', '<b>', '<c>', '</c>', '</b>', '</a>']

我试图遍历它们并找到匹配的索引，例如'<c>'，'</c>'并删除那些片段。它们必须彼此相邻并匹配才能被删除。删除索引后，将再次遍历列表，并保持删除状态，直到列表为空或不再可用为止。

我在想类似的东西：

  for i in range(len(boo)): 
    for b in boo:
       if  boo[i]== '</'+ b +'>' and boo[i-1] == '<' + b +'>':
         boo.remove(boo[i])
         boo.remove(boo[i-1])
         print(boo)

但这似乎没有任何作用。有人可以指出我的问题吗？

编辑

我将其更改为更像这样，但这是说我没有定义。我如何定义我呢？

def valid_html1(test_strings):
    valid = []
    for h in test_strings:
      boo = re.findall('\W+\w+\W', h)
      while i in boo == boo[i]:
         if boo[i][1:] == boo[i+1][2:]:
             boo.remove(boo[i])
             boo.remove(boo[i+1])
             print(boo)

valid_html1(example_set)

Answer 1

在进行比较之前，应该分析字符串以从尖括号中提取标签名称。您可以使用zip来配对相邻标签，并仅在其相邻项目名称不相同时才将项目追加到新列表中：

boo = ['<a>', '<b>', '<c>', '</c>', '</b>', '</a>']
while True:
    pairs = zip(boo, boo[1:] + [''])
    new_boo = []
    for a, b in pairs:
        if a.startswith('<') and a.endswith('>') and \
                b.startswith('</') and b.endswith('>') and a[1:-1] == b[2:-1]:
            next(pairs)
            boo = new_boo
            boo.extend(a for a, _ in pairs)
            break
        new_boo.append(a)
    else:
        break
print(boo)

这将输出：

[]

如果是boo = ['<a>', '<b>', '<c>', '</c>', '</b>', '</a>', '<d>']，则输出：

['<d>']

Answer 2

在99％的情况下，您不应在迭代时编辑列表。

此解决方案进行复制，然后编辑原始列表：

boo_copy = boo[:]
for i, b in enumerate(boo_copy)
   if i == 0:
      continue

   stripped_tag = b.replace("</","").replace(">","").replace("<","") # Removes first and last char to remove '<' and '>'
   if  boo[i]== '</'+ stripped_tag +'>' and boo[i-1] == '<' + stripped_tag +'>':
      boo.remove(boo[i])
      boo.remove(boo[i-1])
      print(boo)

这假定标签在列表中是唯一的。

Answer 3

import re

def open_tag_as_str(tag):
    m = re.match(r'^<(\w+)>$', tag)
    return None if m is None else m.group(1)

def close_tag_as_str(tag):
    m = re.match(r'^</(\w+)>$', tag)
    return None if m is None else m.group(1)

def remove_adjacent_tags(tags):
    def closes(a, b):
        a = open_tag_as_str(a)
        b = close_tag_as_str(b)
        return a is not None and b is not None and a == b

    # This is a bit ugly and could probably be improved with
    # some itertools magic or something
    skip = False
    for i in range(len(tags)):
        if skip:
            skip = False
        elif i + 1 < len(tags) and closes(tags[i], tags[i + 1]):
            skip = True
        else:
            yield tags[i]

boo = ['<a>', '<b>', '<c>', '</c>', '</b>', '</a>']
boo = list(remove_adjacent_tags(boo))
print(boo)

礼物：

['<a>', '<b>', '</b>', '</a>']

搜索列表中接近匹配的元素

3 个答案: