这是一个用于抓取多个元素的循环。有时价格总是找不到。除了 - 除了 - 我需要打印/写入没有价格的那些时间的值。原因是当它刚刚通过时,它在打印(标题,链接,图像,价格)时与变量值不匹配。希望你能在我想要完成的事情中看到我的逻辑。我还附上了截图,以便您了解我的意思。提前谢谢。
#finds titles
deal_title = browser.find_elements_by_xpath("//a[@id='dealTitle']/span")
titles = []
for title in deal_title:
titles.append(title.text)
#finds links
deal_link = browser.find_elements_by_xpath("//div[@class='a-row dealDetailContainer']/div/a[@id='dealTitle']")
links = []
for link in deal_link:
links.append(link.get_attribute('href'))
#finds images
deal_image = browser.find_elements_by_xpath("//a[@id='dealImage']/div/div/div/img")
images = []
for image in deal_image:
images.append(image.get_attribute('src'))
try:
deal_price = browser.find_elements_by_xpath("//div[@class='a-row priceBlock unitLineHeight']/span")
prices = []
for price in deal_price:
prices.append(price.text)
except NoSuchElementException:
price = ("PRINT/WRITE THIS TEXT INSTEAD OF PASSING")
#writes to html
for title, link, image, price in zip(titles, links, images, prices):
f.write("<tr class='border'><td class='image'>" + "<img src=" + image + "></td>" + "<td class='title'><a href=" + link + '>'">" + title + "</a></td><td class='price'>" + price + "</td></tr>")
答案 0 :(得分:0)
如果我理解正确,你就会遇到加载某些页面元素的问题&#34;准时&#34;。
您要阅读的元素在阅读时可能无法加载。
为了防止这种情况发生,您可以使用 explicit waits (脚本将等待指定的元素加载前的指定时间。)
使用此功能时,您错过某些值的可能性会更小。
答案 1 :(得分:0)
import time
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
capabilities = {
'browserName': 'chrome',
'chromeOptions': {
'useAutomationExtension': False,
'forceDevToolsScreenshot': True,
'args': ['--start-maximized', '--disable-infobars']
}
}
driver = webdriver.Chrome(executable_path='./chromedriver_2.38.exe', desired_capabilities=capabilities)
driver.get("""https://www.amazon.com/gp/goldbox/ref=gbps_ftr_s-4_bedf_page_10?
gb_f_deals1=enforcedCategories:2972638011,
dealStates:AVAILABLE%252CWAITLIST%252CWAITLISTFULL,
includedAccessTypes:,page:10,sortOrder:BY_SCORE,
dealsPerPage:32&pf_rd_p=afc45143-5c9c-4b30-8d5c-d838e760bedf&
pf_rd_s=slot-4&pf_rd_t=701&pf_rd_i=gb_main&
pf_rd_m=ATVPDKIKX0DER&pf_rd_r=ZDV4YBQJFDVR3PAY4ZBS&ie=UTF8""")
time.sleep(15)
golds = driver.find_elements_by_css_selector(".widgetContainer #widgetContent > div.singleCell")
print("found %d golds" % len(golds))
template = """\
<tr class="border">
<td class="image"><img src="{0}"></td>\
<td class="title"><a href="{1}">{2}</a></td>\
<td class="price">{3}</td>
</tr>"""
lines = []
for gold in golds:
goldInfo = {}
goldInfo['title'] = gold.find_element_by_css_selector('#dealTitle > span').text
goldInfo['link'] = gold.find_element_by_css_selector('#dealTitle').get_attribute('href')
goldInfo['image'] = gold.find_element_by_css_selector('#dealImage img').get_attribute('src')
try:
goldInfo['price'] = gold.find_element_by_css_selector('.priceBlock > span').text
except NoSuchElementException:
goldInfo['price'] = 'No price display'
print goldInfo['title']
line = template.format(goldInfo['image'], goldInfo['link'], goldInfo['title'], goldInfo['price'])
lines.append(line)
html = """\
<html>
<body>
<table>
{0}
</table>
</body>
</html>\
"""
f = open('./result.html', 'w')
f.write(html.format('\n'.join(lines)))
f.close()
答案 2 :(得分:0)
是,您也可以跳过价格。我正在为您提供另一种方法,您可以按如下方式创建可用价格的列表和相应的图像:
代码块:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
browser=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
browser.get("https://www.amazon.com/gp/goldbox/ref=gbps_ftr_s-4_bedf_page_10?gb_f_deals1=enforcedCategories:2972638011,dealStates:AVAILABLE%252CWAITLIST%252CWAITLISTFULL,includedAccessTypes:,page:10,sortOrder:BY_SCORE,dealsPerPage:32&pf_rd_p=afc45143-5c9c-4b30-8d5c-d838e760bedf&pf_rd_s=slot-4&pf_rd_t=701&pf_rd_i=gb_main&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=ZDV4YBQJFDVR3PAY4ZBS&ie=UTF8")
#finds images
deal_image = browser.find_elements_by_xpath("//div[@class='a-row priceBlock unitLineHeight']/span//preceding::img[1]")
images = []
for image in deal_image:
images.append(image.get_attribute('src'))
#finds prices
deal_price = browser.find_elements_by_xpath("//div[@class='a-row priceBlock unitLineHeight']/span")
prices = []
for price in deal_price:
prices.append(price.text)
#print the information
for image, price in zip(images, prices):
print(image, price)
控制台输出:
https://images-na.ssl-images-amazon.com/images/I/31zt-ovKJqL._AA210_.jpg $9.25
https://images-na.ssl-images-amazon.com/images/I/610%2BKAfr72L._AA210_.jpg $15.89
https://images-na.ssl-images-amazon.com/images/I/41whkQ1m0uL._AA210_.jpg $31.49
https://images-na.ssl-images-amazon.com/images/I/41cAbUWEdoL._AA210_.jpg $259.58 - $782.99
https://images-na.ssl-images-amazon.com/images/I/51raHLFC8wL._AA210_.jpg $139.56
https://images-na.ssl-images-amazon.com/images/I/41fuZZwdruL._AA210_.jpg $41.24
https://images-na.ssl-images-amazon.com/images/I/51N2rdMSh0L._AA210_.jpg $19.50 - $20.99
https://images-na.ssl-images-amazon.com/images/I/515DbJhCtOL._AA210_.jpg $22.97
https://images-na.ssl-images-amazon.com/images/I/51OzOZrj1rL._AA210_.jpg $109.95
https://images-na.ssl-images-amazon.com/images/I/31-QDRkNbhL._AA210_.jpg $15.80
https://images-na.ssl-images-amazon.com/images/I/41vXJ9fvcIL._AA210_.jpg $88.99
https://images-na.ssl-images-amazon.com/images/I/51fKqo2YfcL._AA210_.jpg $21.85
https://images-na.ssl-images-amazon.com/images/I/31GcGUXz9TL._AA210_.jpg $220.99 - $241.99
https://images-na.ssl-images-amazon.com/images/I/41sROkWjnpL._AA210_.jpg $40.48
https://images-na.ssl-images-amazon.com/images/I/51vXMFtZajL._AA210_.jpg $22.72
https://images-na.ssl-images-amazon.com/images/I/512s5ZrjoFL._AA210_.jpg $51.99
https://images-na.ssl-images-amazon.com/images/I/51A8Nfvf8eL._AA210_.jpg $8.30
https://images-na.ssl-images-amazon.com/images/I/51aDac6YN5L._AA210_.jpg $18.53
https://images-na.ssl-images-amazon.com/images/I/31SQON%2BiOBL._AA210_.jpg $10.07
链接:
https://www.amazon.com/gp/goldbox/ref=gbps_ftr_s-4_bedf_page_10?gb_f_deals1=enforcedCategories:2972638011,dealStates:AVAILABLE%252CWAITLIST%252CWAITLISTFULL,includedAccessTypes:,page:10,sortOrder:BY_SCORE,dealsPerPage:32&pf_rd_p=afc45143-5c9c-4b30-8d5c-d838e760bedf&pf_rd_s=slot-4&pf_rd_t=701&pf_rd_i=gb_main&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=ZDV4YBQJFDVR3PAY4ZBS&ie=UTF8
浏览器快照: