尝试从亚马逊抓取数据

时间:2019-01-24 20:52:04

标签: python python-requests

我正在尝试通过亚马逊搜索找到结果的标题,但我找不到它们。

import bs4 as bs
import requests

url = 'https://www.amazon.de/s/ref=nb_sb_noss_2?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&url=search-alias%3Daps&field-keywords=folie'
c = requests.get(url)

soup = bs.BeautifulSoup(c.content, 'lxml')

data_search = soup.find_all('ul', {'id': 's-results-list-atf'})

for link in data_search:
    print(link.contents[0].find_all('a',
                                    {
                                        'class': 'a-link-normal s-access-detail-page  s-color-twister-title-link a-text-normal'}))

此刻我没有任何结果,我也不知道为什么

enter image description here

尝试获取此标头

编辑:

试图获得品牌: enter image description here

试图获得产品的品牌,但我的控制台却被垃圾邮件了。

import bs4 as bs
import requests
from lxml import etree

browser2 = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}
s = requests.Session()
res = s.get('https://login.live.com')
cookies = dict(res.cookies)

request2 = s.get(
    'https://www.amazon.de/BB-Verpackungen-Stretchfolie-transparent-Palettenfolie-Wickelfolie/dp/B004W3O4PS',
    headers=browser2)
soup2 = bs.BeautifulSoup(request2.content, 'lxml')


start = soup2.find_all('div', class_='centerColAlign')

for s in start:
    brand = s.find_all('div', class_='a-section a-spacing-none')
    for b in brand:
        s = b.find_all('a', {'id': 'bylineInfo'})
        for i in s:
            print(i.text)

enter image description here

1 个答案:

答案 0 :(得分:2)

我对其进行了测试并修改了两件事:

  1. 用户代理
  2. 删除了类之间的双倍空格

有代码:

import bs4 as bs
import requests

url = 'https://www.amazon.de/s/ref=nb_sb_noss_2?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&url=search-alias%3Daps&field-keywords=folie'
c = requests.get(url, headers = { 'User-Agent' : 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0'})
#print(c.content)
soup = bs.BeautifulSoup(c.content, 'html.parser')

data_search = soup.find_all('ul', {'id': 's-results-list-atf'})


for link in data_search:
    #print(link)
    #print(type(link))
    f = link.find_all('a', { 'class' : 'a-link-normal s-access-detail-page s-color-twister-title-link a-text-normal' })
    #print(f)
    for a in f:
        print(a)

这是这段代码的结果:

<a class="a-link-normal s-access-detail-page s-color-twister-title-link a-text-normal" href="/gp/slredirect/picassoRedirect.html/ref=pa_sp_atf_aps_sr_pg1_1?ie=UTF8&amp;adId=A00173582V1J1Z4JYCC99&amp;url=https%3A%2F%2Fwww.amazon.de%2FAuto-Folie-selbstklebend-BLASENFREI-Klebefolie%2Fdp%2FB00TDS0PVE%2Fref%3Dsr_1_1_sspa%2F259-1606642-0146458%3Fie%3DUTF8%26qid%3D1548365739%26sr%3D8-1-spons%26keywords%3Dfolie%26psc%3D1&amp;qualifier=1548365739&amp;id=7589640135518839&amp;widgetName=sp_atf" title="4€/m² Auto Folie - schwarz matt - 3 x 1,5 meter selbstklebend BLASENFREI flexibel Car Wrapping Klebefolie"><h2 class="a-size-medium s-inline s-access-title a-text-normal" data-attribute="4€/m² Auto Folie - schwarz matt - 3 x 1,5 meter selbstklebend BLASENFREI flexibel Car Wrapping Klebefolie" data-max-rows="2"><span class="a-offscreen">[Gesponsert]</span>4€/m² Auto Folie - schwarz matt - 3 x 1,5 meter selbstklebend BLASENFREI flexibel Car Wrapping Klebefolie</h2></a>
<a class="a-link-normal s-access-detail-page s-color-twister-title-link a-text-normal" href="/gp/slredirect/picassoRedirect.html/ref=pa_sp_atf_aps_sr_pg1_2?ie=UTF8&amp;adId=A03283802DFVL9Z711KMN&amp;url=https%3A%2F%2Fwww.amazon.de%2FFOSHIO-T%25C3%25B6nungsfolie-Installation-Cuttermesser-Werkzeugtasche%2Fdp%2FB06XNRKD2X%2Fref%3Dsr_1_2_sspa%2F259-1606642-0146458%3Fie%3DUTF8%26qid%3D1548365739%26sr%3D8-2-spons%26keywords%3Dfolie%26psc%3D1&amp;qualifier=1548365739&amp;id=7589640135518839&amp;widgetName=sp_atf" title="FOSHIO Autofolie Wrapping Werkzeug Kit für Auto Tönungsfolie Installation Mit Magnete Filz, Schaber Kuststoff,Rakel mit Filzkante, Cuttermesser,Folienrakel und Handschuhe, Werkzeugtasche"><h2 class="a-size-medium s-inline s-access-title a-text-normal" data-attribute="FOSHIO Autofolie Wrapping Werkzeug Kit für Auto Tönungsfolie Installation Mit Magnete Filz, Schaber Kuststoff,Rakel mit Filzkante, Cuttermesser,Folienrakel und Handschuhe, Werkzeugtasche" data-max-rows="2"><span class="a-offscreen">[Gesponsert]</span>FOSHIO Autofolie Wrapping Werkzeug Kit für Auto Tönungsfolie Installation Mit Magnete Filz, Schaber Kuststoff,Rakel mit Filzkante, Cuttermesser,Folienrakel und Handschuhe, Werkzeugtasche</h2></a>
<a class="a-link-normal s-access-detail-page s-color-twister-title-link a-text-normal" href="https://www.amazon.de/BB-Verpackungen-Stretchfolie-transparent-Palettenfolie-Wickelfolie/dp/B004W3O4PS/ref=sr_1_3/259-1606642-0146458?ie=UTF8&amp;qid=1548365739&amp;sr=8-3&amp;keywords=folie" title="BB-Verpackungen Hand Stretchfolie 23 my (transparent) 500 mm x 285 m, Palettenfolie Handfolie Wickelfolie"><h2 class="a-size-medium s-inline s-access-title a-text-normal" data-attribute="BB-Verpackungen Hand Stretchfolie 23 my (transparent) 500 mm x 285 m, Palettenfolie Handfolie Wickelfolie" data-max-rows="2">BB-Verpackungen Hand Stretchfolie 23 my (transparent) 500 mm x 285 m, Palettenfolie Handfolie Wickelfolie</h2></a>
<a class="a-link-normal s-access-detail-page s-color-twister-title-link a-text-normal" href="https://www.amazon.de/Neoxxim-22%E2%82%AC-Premium-blasenfrei-Luftkan%C3%A4len/dp/B01MYTRMXY/ref=sr_1_4/259-1606642-0146458?ie=UTF8&amp;qid=1548365739&amp;sr=8-4&amp;keywords=folie" title="Neoxxim 24,22€/m2 Premium - Auto Folie - MATT - SCHWARZ - SCHWARZ MATT 30 x 150 cm - blasenfrei mit Luftkanälen ca 0,15mm dick für Auto Folierung folieren bekleben"><h2 class="a-size-medium s-inline s-access-title a-text-normal" data-attribute="Neoxxim 24,22€/m2 Premium - Auto Folie - MATT - SCHWARZ - SCHWARZ MATT 30 x 150 cm - blasenfrei mit Luftkanälen ca 0,15mm dick für Auto Folierung folieren bekleben" data-max-rows="2">Neoxxim 24,22€/m2 Premium - Auto Folie - MATT - SCHWARZ - SCHWARZ MATT 30 x 150 cm - blasenfrei mit Luftkanälen ca 0,15mm dick für Auto Folierung folieren bekleben</h2></a>

希望此后它能正常工作!


编辑: 看起来javascript用html代码进行了操作,因此有必要先修改一下find_all()。之后,我得到了34个结果,而不是第一个代码的4个结果。

import bs4 as bs
import requests

url = 'https://www.amazon.de/s/ref=nb_sb_noss_2?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&url=search-alias%3Daps&field-keywords=folie'
c = requests.get(url, headers = { 'User-Agent' : 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0'})
#print(c.content)
soup = bs.BeautifulSoup(c.content.replace("<!--", "").replace("-->", ""), 'html.parser')   #remove comments tags

data_search = soup.find_all('ul', {'class': 's-result-list'})

count = 0
for link in data_search:
    #print(link)
    #print(type(link))
    f = link.find_all('a', { 'class' : 'a-link-normal s-access-detail-page s-color-twister-title-link a-text-normal' })
    #print(f)
    for a in f:
        print(a)
        count += 1

print(count)