Question

考虑以下情况：

tag1 = soup.find(**data_attrs)
tag2 = soup.find(**delim_attrs)

有没有办法找出发生了哪个标签＆＃34;第一个＆＃34;在页面？

澄清：

就我的目的而言，顺序与beautifulsoup的findNext方法的顺序相同。（我现在正在使用这个事实来解决＆＃34;我的问题，虽然它很糟糕。）
此处的目的基本上是累积未被＆＃34;分隔符标记＆＃34;分隔的标记。也许还有更好的方法吗？

Answer 1

BeautifulSoup代码不会在页面中跟踪他们的订单，没有。你必须遍历所有标签再次并在该列表中找到你的两个标签。

使用标准sample BeautifulSoup tree：

>>> tag1 = soup.find(id='link1')
>>> tag2 = soup.find(id='link2')
>>> tag1, tag2
(<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>)
>>> all_tags = soup.find_all(True)
>>> all_tags.index(tag1)
6
>>> all_tags.index(tag2)
7

我会使用带有函数的tag.find_all()来匹配两种标记类型;这样你就得到了一个标签列表，可以看到它们的相对顺序：

tag_match = lambda el: (
    getattr(el, 'name', None) in ('tagname1', 'tagname2') and
    el.attrs.get('attributename') == 'something' and 
    'classname' in el.attrs.get('class')
)
tags = soup.find(tag_match)

或者您可以使用.next_siblings迭代器遍历同一父级中的所有元素，并查看下一个分隔符是否等等。

BeautifulSoup标签的出现顺序

1 个答案: