Question

我正在尝试使用BeautifulSoup解析HTML。

我想要的内容是这样的：

<a class="yil-biz-ttl" id="yil_biz_ttl-2" href="http://some-web-url/" title="some title">Title</a>

我试过并得到以下错误：

maxx = soup.findAll("href", {"class: "yil-biz-ttl"})
------------------------------------------------------------
   File "<ipython console>", line 1
     maxx = soup.findAll("href", {"class: "yil-biz-ttl"})
                                             ^
SyntaxError: invalid syntax

我想要的是字符串：http://some-web-url/

Answer 1

您在"class之后错过了近距离报价：

 maxx = soup.findAll("href", {"class: "yil-biz-ttl"})

应该是

 maxx = soup.findAll("href", {"class": "yil-biz-ttl"})

另外，我认为您不能像这样搜索href这样的属性，我认为您需要搜索标签：

 maxx = [link['href'] for link in soup.findAll("a", {"class": "yil-biz-ttl"})]

Answer 2

soup.findAll('a', {'class': 'yil-biz-ttl'})[0]['href']

找到所有这些链接：

for link in soup.findAll('a', {'class': 'yil-biz-ttl'}):
    try:
        print link['href']
    except KeyError:
        pass

Answer 3

要查找CSS类<a/>中包含"yil-biz-ttl"属性的所有href元素：

from bs4 import BeautifulSoup  # $ pip install beautifulsoup4

soup = BeautifulSoup(HTML)
for link in soup("a", "yil-biz-ttl", href=True):
    print(link['href'])

目前所有其他答案都不符合上述要求。

Answer 4

首先，您有语法错误。您在class部分中的引号有误。

尝试：

maxx = soup.findAll("href", {"class": "yil-biz-ttl"})

解析html标签，基于类和href标签使用美丽的汤

4 个答案: