我想遍历此列表(https://express-press-release.net/Industries/Automotive-press-releases.php)上的每个URL,然后复制数据并返回到根目录以获取下一个URL。 我可以从单个页面抓取,但不能通过多个链接抓取。
答案 0 :(得分:1)
您可以使用href找到所有<a>
标签,并将它们放入列表中。然后只需遍历该列表即可。您可能需要添加一些其他类型的过滤器,因为您可能只想要特定的链接,但这可以助您一臂之力:
import requests
from bs4 import BeautifulSoup
url = 'https://express-press-release.net/Industries/Automotive-press-releases.php'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
links = soup.find_all('a', href=True)
root = 'https://express-press-release.net/'
link_list = [ root + a['href'] for a in links if '..' in a['href'] ]
for link in link_list:
do some stuff...