抓取工具未提取网址链接:

时间:2019-02-14 04:56:26

标签: python beautifulsoup screen-scraping

您好,我想在此网站上刮取“在Amazon上查看商品”链接下的Amazon url地址。

我的代码在下面,我的响应为零。感谢任何帮助。谢谢

import requests
url = "https://app.jumpsend.com/deals/230513"

response = requests.get(url)
data = response.text

soup = BeautifulSoup(data, 'lxml')

tags = soup.find_all('a')

for tag in tags:
    print(tag.get('href'))

1 个答案:

答案 0 :(得分:2)

Amazon链接(https://www.amazon.com/dp/B07MH9DK5B)不在html页面源中。您需要使用Selenium才能读入Java脚本设置的所有元素的html:

from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://app.jumpsend.com/deals/230513"
driver = webdriver.Firefox()
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
soup.find('a', attrs={'class': 'deal-modal-link'})['href']

上面的代码打印出了Amazon链接:

'https://www.amazon.com/dp/B07MH9DK5B'