Question

我需要从以下href链接获取艺术和传记

<a class="gr-hyperlink" href="/genres/art">Art</a>,
 <a class="gr-hyperlink" href="/genres/biography">Biography</a>,

这是我的代码

import numpy as np
import pandas as pd
from urllib import urlopen
from bs4 import BeautifulSoup
import re

def getHTMLContent(link):
    html = urlopen(link)
    soup = BeautifulSoup(html, 'html.parser')
    return soup

content = getHTMLContent('https://abc')
hyperLinks = content.find_all('a', class_="gr-hyperlink")
hyperLinks

Answer 1

在find_all元素上运行BeautifulSoup后，您将获得一个可迭代的ResultSet元素。
ResultSet中的每个项目都是BeautifulSoup Tag元素。

使用BeautifulSoup的get_text方法提取Tag的文本：

content = [link.get_text() for link in hyperLinks]

Python抓取fetech href链接

1 个答案: