使用bs4进行网络抓取时始终获得无结果

时间:2020-08-29 06:11:37

标签: python web-scraping beautifulsoup python-requests

我是python的新手。我刚开始学习网络抓取,因此决定为列出的产品名称做网络抓取亚马逊。因此,我启动了chrome dev工具,然后单击检查Amazon产品名称,然后记下该类,在这种情况下,该类的名称为“ a-link-normal”。问题是我得到的结果为无。 这是代码-

import webbrowser
import requests
from bs4 import BeautifulSoup

source = requests.get('https://www.amazon.in/s?k=books&ref=nb_sb_noss')
soup = BeautifulSoup(source.text, 'lxml')

name = soup.find('a', class_ = 'a-link-normal')
print(name)

这是即时通讯正在检查的屏幕截图- link to image

我是网络爬虫的新手,但网站的复杂性使他们不知所措,所以请根据需要提供任何建议

感谢您的帮助 谢谢

2 个答案:

答案 0 :(得分:1)

似乎Amazon阻止了任何爬网,我检查了一下,并且当您第一次运行代码时,可以提取内容。每当代码第二次立即运行时,它将被阻止。如果打印出 soup 变量,将面临以下通知:

要讨论自动访问Amazon数据,请联系api-services-support@amazon.com。有关迁移到我们的API的信息,请参阅https://developer.amazonservices.in/ref=rm_c_sv上的Marketplace API或/{main.html/ref=rm_c_ac/https://affiliate-program.amazon.in/gp/advertising/api/detai上的产品广告API。

对不起,我们只需要确保您不是机器人即可。为了获得最佳效果,请确保您的浏览器接受cookie。

我建议您使用 Selenium Library ,而不是考虑代码中的某些延迟以像人类的交互那样工作。

但是,尝试在几分钟内运行下面的代码,您可以提取书籍的标题:

import requests
from bs4 import BeautifulSoup

source = requests.get('https://www.amazon.in/s?k=books&ref=nb_sb_noss')
soup = BeautifulSoup(source.content, 'html.parser')
#print(soup)

names = soup.find_all('span', class_="a-size-medium a-color-base a-text-normal")
for name in names:
    print(name.text)

答案 1 :(得分:0)

要从Amazon服务器获得正确的响应,请使用User-Agent HTTP标头:

import requests
from bs4 import BeautifulSoup


headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0'}
source = requests.get('https://www.amazon.in/s?k=books&ref=nb_sb_noss', headers=headers)
soup = BeautifulSoup(source.text, 'lxml')

for a in soup.select('a.a-link-normal > span.a-size-medium'):
    print(a.get_text(strip=True))

打印:

The Power of Your Subconscious Mind (DELUXE HARDBOUND EDITION)
World’s Greatest Books For Personal Growth & Wealth (Set of 4 Books): Perfect Motivational Gift Set
Ikigai: The Japanese secret to a long and happy life
Attitude Is Everything: Change Your Attitude ... Change Your Life!
World’s Greatest Books For Personal Growth & Wealth (Set of 4 Books): Perfect Motivational Gift Set
The Theory of Everything
The Subtle Art of Not Giving a F*ck
The Alchemist
The Monk Who Sold His Ferrari
The Rudest Book Ever
As a Man Thinketh
How to Stop Worrying and Start Living: Time-Tested Methods for Conquering Worry
Help Hungry Henry Deal with Anger : An Interactive Picture Book About Anger Management
The Girl in Room 105
The Blue Umbrella
Wings of Fire: An Autobiography of Abdul Kalam
My First Library: Boxset of 10 Board Books for Kids
Who Will Cry When You Die?
Rich Dad Poor Dad : What The Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!
Rough Book
The Leader Who Had No Title
The Power Of Influence