如何使用xpath或bs4从2个不同的模板中获取价格?(python,网络抓取)

时间:2018-07-16 09:50:24

标签: python selenium web-scraping

我有两个价格不同的模板。对于第一个模板,价格具有id priceblock_ourprice,并且可以正确打印,但对于第二个模板,则不显示价格。如何在csv中打印价格?您可以使用 xpath或美丽的汤。此代码的问题在第一个try / except语句上。我附加了代码和output(csv)。我将不胜感激。

这是输出(CSV)

enter image description here

import csv
from selenium import webdriver
from bs4 import BeautifulSoup
import requests
from lxml import html

links = [
      'https://www.amazon.com/Stock-Your-Home-Spinning-Organizer/dp/B00424ILAQ/ref=sr_1_4012/138-3260504-2979110?s=bedbath&ie=UTF8&qid=1520585702&sr=1-4012&keywords=-sdfg',
      'https://www.amazon.com/Seward-Trunk-College-Footlocker-SWD5120-10/dp/B004835DI4/ref=sr_1_3?s=furniture&ie=UTF8&qid=1520407190&sr=1-3&keywords=-hgfd'
]
proxies = {
    'http': 'http://218.50.2.102:8080',
    'https': 'http://185.93.3.123:8080'
}

def get_information(driver,urls):
    with open('csv/sort_products.csv', "w", newline="", encoding="utf-8") as infile:
        writer = csv.writer(infile)
        writer.writerow(['Price',  'Link'])

        for url in urls:
            driver.get(url)
            soup = BeautifulSoup(driver.page_source,"lxml")

            try:
                price = driver.find_element_by_xpath('//span[@id="color_name_0_price"]/span').text

            except:
                price='No price v1'
                print('No price v1')

            try:
                price = driver.find_element_by_xpath('//span[@id="priceblock_ourprice"]').text
            except:
                price='No price v2'
                print('No price v2')

            writer.writerow([ price, url])
            print(f'{url}\n')

if __name__ == '__main__':
    chrome_options = webdriver.ChromeOptions()

    chrome_options.add_argument('--proxy-server="%s"' % ';'.join(['%s=%s' % (k, v) for k, v in proxies.items()]))

    driver = webdriver.Chrome(executable_path="C:\\Users\Andrei-PC\Downloads\webdriver\chromedriver.exe",
                              chrome_options=chrome_options)
    get_information(driver,links)
    driver.quit()

1 个答案:

答案 0 :(得分:1)

我使用浏览器检查了两个URL,看起来在第二个URL中没有span,ID为priceblock_ourprice。因此,当然driver.find_element_by_xpath找不到合适的跨度。

但是我可以找到以下范围:<span class="a-size-base a-color-price offer-price a-text-normal">$62.26</span>

在您使用浏览器时,亚马逊的页面服务器的内容可能有所不同,而在运行硒时(例如,由于cookie的原因)则有所不同。请仔细检查硒中的页面来源。