网页抓取更多页面

时间:2020-07-06 08:41:14

标签: python python-3.x web-scraping beautifulsoup python-requests

目前,我正在为网站的网络抓取工作,该网站在页面自动加载时需要数据。我正在使用BeautifullSoup和请求。

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.monki.com/en/newin/view-all-new.html")
soup = BeautifulSoup(page.content, 'html.parser')
article_codes=[] 
for k in soup.findAll('div',attrs={"class":"producttile-details"}):
    article_code = k.find('span', attrs={'class':"articleCode"})
    print(article_code)

    article_codes.append(article_code.text) 

使用此代码,我只能获取页面的数据,但是我希望在页面加载后获取所有数据。

1 个答案:

答案 0 :(得分:0)

该页面正在使用JavaScript加载其他页面。您可以使用requests模块来模拟这些请求。

例如:

import requests
from bs4 import BeautifulSoup

url = 'https://www.monki.com/en_eur/newin/view-all-new/_jcr_content/productlisting.products.html'
headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0',
}

with requests.session() as s:
    s.get('https://www.monki.com/en_eur/newin/view-all-new.html', headers=headers).text

    for page in range(0, 10):  # <-- adjust to required number of pages
        soup = BeautifulSoup(s.get(url, params={'offset': page*28}, headers=headers).content, 'html.parser')

        for product in soup.select('.o-product'):
            name = product.select_one('.product-name').get_text(strip=True)
            price = product.select_one('.price-tag').get_text(strip=True)
            link = product.select_one('.a-link')['href']

            print('{:<50} {:<10} {}'.format(name, price , link))

打印所有产品:

NEW! Maxi smock dress                              €30        https://www.monki.com/en_eur/clothing/dresses/midi-dresses/product.midi-button-up-shirt-dress-black.0871799004.html
NEW! Retro skater dress                            €20        https://www.monki.com/en_eur/clothing/dresses/mini-dresses/product.retro-skater-dress-white.0688447029.html
NEW! Mozik block jeans                             €40        https://www.monki.com/en_eur/clothing/jeans/product.mozik-block-jeans-blue.0874088001.html
NEW! Pack of two scrunchies                        €6         https://www.monki.com/en_eur/accessories/hair-accessories/product.pack-of-two-scrunchies-beige.0530296078.html
NEW! Mini hand bag                                 €18        https://www.monki.com/en_eur/accessories/bags,-wallets-belts/bags/product.mini-hand-bag-black.0826291006.html
NEW! Fitted crop top                               €10        https://www.monki.com/en_eur/clothing/tops/t-shirts/product.fitted-crop-top-purple.0906440002.html
NEW! Tiered smock dress                            €30        https://www.monki.com/en_eur/clothing/dresses/midi-dresses/product.tiered-smock-dress-blue.0895277004.html
NEW! Mini hand bag                                 €18        https://www.monki.com/en_eur/accessories/bags,-wallets-belts/bags/product.mini-hand-bag-beige.0826291008.html
NEW! Fitted t-shirt                                €10        https://www.monki.com/en_eur/clothing/tops/t-shirts/product.fitted-t-shirt-purple.0905746002.html
NEW! Shoulder pads t-shirt dress                   €25        https://www.monki.com/en_eur/clothing/dresses/mini-dresses/product.shoulder-pads-t-shirt-dress-beige.0929301002.html
NEW! Yoko mid blue jeans                           €40        https://www.monki.com/en_eur/clothing/jeans/product.yoko-mid-blue-jeans-blue.0656425001.html
NEW! Yoko classic blue jeans                       €40        https://www.monki.com/en_eur/clothing/jeans/product.yoko-classic-blue-jeans-blue.0807218001.html
NEW! Pleated midi skirt                            €25        https://www.monki.com/en_eur/clothing/skirts/midi-skirts/product.pleated-midi-skirt-black.0562278003.html

... and so on.