我是python的新手。我刚开始学习网络抓取,因此决定为列出的产品名称做网络抓取亚马逊。因此,我启动了chrome dev工具,然后单击检查Amazon产品名称,然后记下该类,在这种情况下,该类的名称为“ a-link-normal”。问题是我得到的结果为无。 这是代码-
import webbrowser
import requests
from bs4 import BeautifulSoup
source = requests.get('https://www.amazon.in/s?k=books&ref=nb_sb_noss')
soup = BeautifulSoup(source.text, 'lxml')
name = soup.find('a', class_ = 'a-link-normal')
print(name)
我是网络爬虫的新手,但网站的复杂性使他们不知所措,所以请根据需要提供任何建议
感谢您的帮助 谢谢
答案 0 :(得分:1)
似乎Amazon阻止了任何爬网,我检查了一下,并且当您第一次运行代码时,可以提取内容。每当代码第二次立即运行时,它将被阻止。如果打印出 soup 变量,将面临以下通知:
要讨论自动访问Amazon数据,请联系api-services-support@amazon.com。有关迁移到我们的API的信息,请参阅https://developer.amazonservices.in/ref=rm_c_sv上的Marketplace API或/{main.html/ref=rm_c_ac/https://affiliate-program.amazon.in/gp/advertising/api/detai上的产品广告API。
对不起,我们只需要确保您不是机器人即可。为了获得最佳效果,请确保您的浏览器接受cookie。
我建议您使用 Selenium Library ,而不是考虑代码中的某些延迟以像人类的交互那样工作。
但是,尝试在几分钟内运行下面的代码,您可以提取书籍的标题:
import requests
from bs4 import BeautifulSoup
source = requests.get('https://www.amazon.in/s?k=books&ref=nb_sb_noss')
soup = BeautifulSoup(source.content, 'html.parser')
#print(soup)
names = soup.find_all('span', class_="a-size-medium a-color-base a-text-normal")
for name in names:
print(name.text)
答案 1 :(得分:0)
要从Amazon服务器获得正确的响应,请使用User-Agent
HTTP标头:
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0'}
source = requests.get('https://www.amazon.in/s?k=books&ref=nb_sb_noss', headers=headers)
soup = BeautifulSoup(source.text, 'lxml')
for a in soup.select('a.a-link-normal > span.a-size-medium'):
print(a.get_text(strip=True))
打印:
The Power of Your Subconscious Mind (DELUXE HARDBOUND EDITION)
World’s Greatest Books For Personal Growth & Wealth (Set of 4 Books): Perfect Motivational Gift Set
Ikigai: The Japanese secret to a long and happy life
Attitude Is Everything: Change Your Attitude ... Change Your Life!
World’s Greatest Books For Personal Growth & Wealth (Set of 4 Books): Perfect Motivational Gift Set
The Theory of Everything
The Subtle Art of Not Giving a F*ck
The Alchemist
The Monk Who Sold His Ferrari
The Rudest Book Ever
As a Man Thinketh
How to Stop Worrying and Start Living: Time-Tested Methods for Conquering Worry
Help Hungry Henry Deal with Anger : An Interactive Picture Book About Anger Management
The Girl in Room 105
The Blue Umbrella
Wings of Fire: An Autobiography of Abdul Kalam
My First Library: Boxset of 10 Board Books for Kids
Who Will Cry When You Die?
Rich Dad Poor Dad : What The Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!
Rough Book
The Leader Who Had No Title
The Power Of Influence