Question

正如标题所说，我正在尝试获取位于单元格内部的链接的标题。 This is网站我收到了我的东西。我也看过this问题，这是我从最后几行代码中得到的问题，但它并没有为我完成它

我正在尝试获取第一列（或每行的第一个单元格）内的链接标题。我可以在单元格中获得所有 HTML代码，但是我很难找到获得标题的麻烦。这是我到目前为止所提出的

URL = 'http://theescapists.gamepedia.com/Crafting'
get_page = requests.get(URL)
plain_text = get_page.text
soup = BeautifulSoup(plain_text, 'html.parser')


for table_tag in soup.find_all('table'):
    for each_row in table_tag.find_all('tr'):
        links = each_row.find('a', href=True)
        title = links.get('title')
        print(title)
        print('')

如果我只打印links部分，则会打印每个单元格中的所有代码。

我在打印AttributeError: 'NoneType' object has no attribute 'get'部分时收到title的错误，这让我感到困惑，因为我已经完成print(type(links)) and I get a bs4.element.Tag {{1} } title`标签。

作为回顾（这似乎有点长），我想从每个表中每个链接的第一个单元格中获取标题标记

Answer 1

tr标记可以包含th标记但没有a标记，您应该在访问之前检查a标记：

In [100]: for table_tag in soup.find_all('table'):
     ...:     for each_row in table_tag.find_all('tr'):
     ...:         links = each_row.find('a', href=True)
     ...:         if links: # check before you access
     ...:             title = links.get('title')
     ...:             print(title)
     ...:             print('')

Answer 2

我认为links.attrs['title']就是你想要的。

我的代码：

for table_tag in soup.find_all('table'):
    for each_row in table_tag.find_all('tr'):
        links = each_row.find('a', href=True)
        try:
            title = links.attrs['title']
            print(title)
            print('')
        except AttributeError:
            pass

注意：AttributeError将处理表格的标题，该表格没有title。

使用BeautifulSoup获取链接的标题

2 个答案: