Question

我使用以下代码来匹配所有具有CSS类“ad_item”的div。

soup.find_all('div',class_="ad_item")

我遇到的问题是，在该网页上，还有一个将CSS类设置为“ad_ex_item”和“ad_ex_item”的div。

<div class="ad_item ad_ex_item">

In documentation it is stated:

当您搜索与某个CSS类匹配的标记时，您就是匹配任何CSS类：

那么如何匹配div，只有“ad_item”，并且没有“ad_ex_item”。

或者换一种说法，如何搜索只有CSS类“ad_item”的div？

Answer 1

我找到了一个解决方案，虽然它与BS4无关，但它是纯Python代码。

for item in soup.find_all('div',class_="ad_item"):
     if len(item["class"]) != 1:
         continue;

如果有多个CSS类，它基本上会跳过项目。

Answer 2

您可以将lambda函数传递给find和find_all方法。

soup.find_all(lambda x:
    x.name == 'div' and
    'ad_item' in x.get('class', []) and
    not 'ad_ex_item' in x['class']
)

x.get('class', [])将避免KeyError个div标记的class例外情况。

如果您只需要排除一个以上的课程，您可以用最后一个条件替换：

    not any(c in x['class'] for c in {'ad_ex_item', 'another_class'})

如果你想要完全排除一些你可以使用的类：

   not all(c in x['class'] for c in {'ad_ex_item', 'another_class'})

Answer 3

您可以使用如下严格条件：

soup.select("div[class='ad_item']")

具有确切类别的捕获div。在这种情况下，只有'ad_item'，没有其他人通过空格类加入。

Answer 4

您是否尝试使用select：http://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors

soup.select(".add_item")

不幸的是，似乎不支持CSS3的:not选择器。如果你确实需要这个，你可能需要查看lxml。它似乎支持它。见http://packages.python.org/cssselect/#supported-selectors

Answer 5

您可以随时write a Python function that matches the tag you want，并将该函数传递给find_all（）：

def match(tag):
    return (
        tag.name == 'div'
        and 'ad_item' in tag.get('class')
        and 'ad_ex_item' not in tag.get('class'))

soup.find_all(match)

Answer 6

最正确的答案是正确的，但是如果您想要一种方法来保持for循环的整洁或喜欢一种解决方案，请使用下面的列表理解。

data = [item for item in soup.find_all("div", class_="ad_item") if len(item["class"]) == 1]

Answer 7

soup.fetch('div',{'class':'add_item'})

如何美丽的汤（bs4）只匹配一个，只有一个，CSS类

7 个答案: