我有两个价格不同的模板。对于第一个模板,价格具有id priceblock_ourprice,并且可以正确打印,但对于第二个模板,则不显示价格。如何在csv中打印价格?您可以使用 xpath或美丽的汤。此代码的问题在第一个try / except语句上。我附加了代码和output(csv)。我将不胜感激。
这是输出(CSV)
import csv
from selenium import webdriver
from bs4 import BeautifulSoup
import requests
from lxml import html
links = [
'https://www.amazon.com/Stock-Your-Home-Spinning-Organizer/dp/B00424ILAQ/ref=sr_1_4012/138-3260504-2979110?s=bedbath&ie=UTF8&qid=1520585702&sr=1-4012&keywords=-sdfg',
'https://www.amazon.com/Seward-Trunk-College-Footlocker-SWD5120-10/dp/B004835DI4/ref=sr_1_3?s=furniture&ie=UTF8&qid=1520407190&sr=1-3&keywords=-hgfd'
]
proxies = {
'http': 'http://218.50.2.102:8080',
'https': 'http://185.93.3.123:8080'
}
def get_information(driver,urls):
with open('csv/sort_products.csv', "w", newline="", encoding="utf-8") as infile:
writer = csv.writer(infile)
writer.writerow(['Price', 'Link'])
for url in urls:
driver.get(url)
soup = BeautifulSoup(driver.page_source,"lxml")
try:
price = driver.find_element_by_xpath('//span[@id="color_name_0_price"]/span').text
except:
price='No price v1'
print('No price v1')
try:
price = driver.find_element_by_xpath('//span[@id="priceblock_ourprice"]').text
except:
price='No price v2'
print('No price v2')
writer.writerow([ price, url])
print(f'{url}\n')
if __name__ == '__main__':
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server="%s"' % ';'.join(['%s=%s' % (k, v) for k, v in proxies.items()]))
driver = webdriver.Chrome(executable_path="C:\\Users\Andrei-PC\Downloads\webdriver\chromedriver.exe",
chrome_options=chrome_options)
get_information(driver,links)
driver.quit()
答案 0 :(得分:1)
我使用浏览器检查了两个URL,看起来在第二个URL中没有span
,ID为priceblock_ourprice
。因此,当然driver.find_element_by_xpath
找不到合适的跨度。
但是我可以找到以下范围:<span class="a-size-base a-color-price offer-price a-text-normal">$62.26</span>
在您使用浏览器时,亚马逊的页面服务器的内容可能有所不同,而在运行硒时(例如,由于cookie的原因)则有所不同。请仔细检查硒中的页面来源。