亚马逊<p>使用python 3,requests和bs4进行网络抓取

时间:2018-10-24 09:02:55

标签: python-3.x

当前,如果使用标头,则运行代码-> print list =空,但如果我不使用标头,则-> print list =包含数据,但是如果使用503 Server Error: Service Unavailable会出错。我不明白为什么要使用标题然后列表=空。 谢谢帮助我

import bs4
import requests

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 7.0; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0"}
res = requests.get('https://www.amazon.com/dp/B07GMXQN8X', headers=headers)
soup = bs4.BeautifulSoup(res.content,'html.parser')
a = soup.find_all('p')
print(a)

1 个答案:

答案 0 :(得分:-1)

您需要使用res.text而不是res.content。另外,您可以尝试更改parser

import bs4
import requests

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 7.0; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0"}
res = requests.get('https://www.amazon.com/dp/B07GMXQN8X', headers=headers)
soup = bs4.BeautifulSoup(res.text, 'html5lib')
# soup = bs4.BeautifulSoup(res.text, 'lxml')

a = soup.find_all('p')
print(a)

输出:

[<p>Sponsored Products are advertisements for products sold by merchants on Amazon.com. When you click on a Sponsored Product ad, you will be taken to an Amazon detail page where you can learn more about the product and purchase it.</p>, <p>        To learn more about Amazon Sponsored Products,<span class="a-letter-space"></span>        <a class="a-link-normal" href="https://advertising.amazon.com/products-self-serve?ref_=ext_amzn_wtsp" rel="noopener" target="_blank" title="click here">click here</a>.    </p>, <p class="a-spacing-small a-size-small a-color-secondary">
    Find answers in product info, Q&amp;As, reviews
  </p>, <p class="a-spacing-base a-spacing-top-base a-color-error askError askBadQuestionError">
            Please make sure that you are posting in the form of a question.
          </p>, <p>Sponsored Products are advertisements for products sold by merchants on Amazon.com. When you click on a Sponsored Product ad, you will be taken to an Amazon detail page where you can learn more about the product and purchase it.</p>, <p>        To learn more about Amazon Sponsored Products,<span class="a-letter-space"></span>        <a class="a-link-normal" href="https://advertising.amazon.com/products-self-serve?ref_=ext_amzn_wtsp" rel="noopener" target="_blank" title="click here">click here</a>.    </p>, <p class="nav_p nav-bold">There's a problem loading this menu right now.</p>, <p class="nav_p"><a class="nav_a" href="/gp/prime/ref=nav_prime_ajax_err/145-9045450-0196650">Learn more about Amazon Prime.</a></p>]