Python列表搜索,比较和消除元素

时间:2015-04-07 07:13:46

标签: python list search compare

我想得到所有没有配对的元素。 这是一个从上到下读取的XML标记列表,删除了括号。 我想找到对(例如开头标记note和结束标记/note),将其从列表中删除,然后留下没有对的标记。

如何遍历列表,将每个标记与所有其他标记进行比较,并举例说:aha,我发现另一个标记'以正斜杠开头的标签?

感谢。

其他 - 更好 - 找到不匹配标签的想法?

PS:我确实希望保留列表的顺序,如果可能,在将标记与列表中的另一个标记进行比较时使用相等性。如果' in'使用运算符它不会起作用,因为如果标签名称是一个字母,如' a',那么搜索将返回包含a的所有元素,而不是完全匹配' a'

tags = ['note', 'to', 'bbb', 'bbb', 'firstname', '/firstname', 'lastname', '/lastname', 'from', 'hello', 'hello', 'hello', 'hello', 'hello', 'l', '/from', '/to', 'elephant', 'll', 'from', '/from', 'a1', 'img', 'a2', 'from', 'from', '/from', '/from', '/a2', '/img', '/a1', 'heading', '/heading', 'body', '/body', '/note']

2 个答案:

答案 0 :(得分:0)

您可以使用所有结束标记创建set,然后使用该集来过滤标记。

>>> closing = set([t for t in tags if t.startswith("/")])
>>> [t for t in tags if "/" + t not in closing and t not in closing]
['bbb', 'bbb', 'hello', 'hello', 'hello', 'hello', 'hello', 'l', 'elephant', 'll']

但请注意,这并不会真正尊重"对"标签,但只是看看是否有"关闭"列表中相同标记的变体。例如,给定tags = ["a", "a", "/a"]tags = ["a", "/a", "a"],它会从列表中删除 a个实例。

答案 1 :(得分:0)

程序的第一部分获取列表中的所有标记。如果您发现这是找到不匹配括号的问题。它可以通过将列表视为堆栈来解决,并找出哪些标签有缺陷,并在此过程中进行迭代。

import re

def clean_attr(attr):
    attr_list = re.split(r'\s+', attr)
    if len(attr_list) == 1:
        return attr
    else:
        return attr_list[0] + '>'

line="""
<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
   <book id="bk103">
      <author>Corets, Eva</author>
      <title>Maeve Ascendant</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-11-17</publish_date>
      <description>After the collapse of a nanotechnology 
      society in England, the young survivors lay the 
      foundation for a new society.</description>
   </book>
   <book id="bk104">
      <author>Corets, Eva</author>
      <title>Oberon's Legacy</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2001-03-10</publish_date>
      <description>In post-apocalypse England, the mysterious 
      agent known only as Oberon helps to create a new life 
      for the inhabitants of London. Sequel to Maeve 
      Ascendant.</description>
   </book>
   <book id="bk105">
      <author>Corets, Eva</author>
      <title>The Sundered Grail</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2001-09-10</publish_date>
      <description>The two daughters of Maeve, half-sisters, 
      battle one another for control of England. Sequel to 
      Oberon's Legacy.</description>
   </book>
   <book id="bk106">
      <author>Randall, Cynthia</author>
      <title>Lover Birds</title>
      <genre>Romance</genre>
      <price>4.95</price>
      <publish_date>2000-09-02</publish_date>
      <description>When Carla meets Paul at an ornithology 
      conference, tempers fly as feathers get ruffled.</description>
   </book>
   <book id="bk107">
      <author>Thurman, Paula</author>
      <title>Splish Splash</title>
      <genre>Romance</genre>
      <price>4.95</price>
      <publish_date>2000-11-02</publish_date>
      <description>A deep sea diver finds true love twenty 
      thousand leagues beneath the sea.</description>
   </book>
   <book id="bk108">
      <author>Knorr, Stefan</author>
      <title>Creepy Crawlies</title>
      <genre>Horror</genre>
      <price>4.95</price>
      <publish_date>2000-12-06</publish_date>
      <description>An anthology of horror stories about roaches,
      centipedes, scorpions  and other insects.</description>
   </book>
   <book id="bk109">
      <author>Kress, Peter</author>
      <title>Paradox Lost</title>
      <genre>Science Fiction</genre>
      <price>6.95</price>
      <publish_date>2000-11-02</publish_date>
      <description>After an inadvertant trip through a Heisenberg
      Uncertainty Device, James Salway discovers the problems 
      of being quantum.</description>
   </book>
   <book id="bk110">
      <author>O'Brien, Tim</author>
      <title>Microsoft .NET: The Programming Bible</title>
      <genre>Computer</genre>
      <price>36.95</price>
      <publish_date>2000-12-09</publish_date>
      <description>Microsoft's .NET initiative is explored in 
      detail in this deep programmer's reference.</description>
   </book>
      <author>O'Brien, Tim</author>
      <title>MSXML3: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <price>36.95</price>
      <publish_date>2000-12-01</publish_date>
      <description>The Microsoft MSXML3 parser is covered in 
      detail, with attention to XML DOM interfaces, XSLT processing, 
      SAX and more.</description>
   </book>
   <book id="bk112">
      <author>Galos, Mike</author>
      <title>Visual Studio 7: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <price>49.95</price>
      <publish_date>2001-04-16</publish_date>
      <description>Microsoft Visual Studio 7 is explored in depth,
      looking at how Visual Basic, Visual C++, C#, and ASP+ are 
      integrated into a comprehensive development 
      environment.
   </book>
</catalog>

"""
attr_open = re.findall(r'<[\w+\s=\"]+>', line)
attr_closed = re.findall(r'<\/\w+>', line)
all_attrs = re.findall(r'<[\w+\s=\"]+>|<\/\w+>', line)

all_attrs_cleaned = map(clean_attr, all_attrs)

# print all_attrs_cleaned

list_as_stack = []
not_closed = []
all_attrs_cleaned = iter(all_attrs_cleaned)

an_attr = all_attrs_cleaned.next()

try:
    while all_attrs_cleaned:
        if not an_attr.startswith('</'):
            list_as_stack.append(an_attr)
            an_attr = all_attrs_cleaned.next()
        else:
            temp = list_as_stack[-1]
            if re.search(r'\w+', temp).group(0) == re.search(r'\w+', an_attr).group(0):
                list_as_stack.pop()
                an_attr = all_attrs_cleaned.next()
            else:
                if len(list_as_stack) != 0:
                    not_closed.append(an_attr)  
                an_attr = all_attrs_cleaned.next()
except Exception:
    print "Stop Iter"

print list_as_stack
print not_closed

在上面的程序中,第一个数组告诉你哪些标签没有关闭,第二个数组告诉你哪些结束标签没有开始标签。