无法找到元素XPath,Skip&写占位符文本

时间:2018-05-27 20:10:43

标签: python selenium

这是一个用于抓取多个元素的循环。有时价格总是找不到。除了 - 除了 - 我需要打印/写入没有价格的那些时间的值。原因是当它刚刚通过时,它在打印(标题,链接,图像,价格)时与变量值不匹配。希望你能在我想要完成的事情中看到我的逻辑。我还附上了截图,以便您了解我的意思。提前谢谢。

enter image description here

#finds titles
deal_title = browser.find_elements_by_xpath("//a[@id='dealTitle']/span")
titles = []
for title in deal_title:
    titles.append(title.text)

#finds links
deal_link = browser.find_elements_by_xpath("//div[@class='a-row dealDetailContainer']/div/a[@id='dealTitle']")
links = []
for link in deal_link:
    links.append(link.get_attribute('href'))

#finds images
deal_image = browser.find_elements_by_xpath("//a[@id='dealImage']/div/div/div/img")
images = []
for image in deal_image:
    images.append(image.get_attribute('src'))

try:

    deal_price = browser.find_elements_by_xpath("//div[@class='a-row priceBlock unitLineHeight']/span")
    prices = []
    for price in deal_price:
        prices.append(price.text)

except NoSuchElementException:
    price = ("PRINT/WRITE THIS TEXT INSTEAD OF PASSING")

#writes to html
for title, link, image, price in zip(titles, links, images, prices):
    f.write("<tr class='border'><td class='image'>" + "<img src=" + image + "></td>" + "<td class='title'><a href=" + link + '>'">" + title + "</a></td><td class='price'>" + price + "</td></tr>")

3 个答案:

答案 0 :(得分:0)

如果我理解正确,你就会遇到加载某些页面元素的问题&#34;准时&#34;。

您要阅读的元素在阅读时可能无法加载。

为了防止这种情况发生,您可以使用 explicit waits (脚本将等待指定的元素加载前的指定时间。)

使用此功能时,您错过某些值的可能性会更小。

答案 1 :(得分:0)

import time
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException

capabilities = {
  'browserName': 'chrome',
  'chromeOptions':  {
    'useAutomationExtension': False,
    'forceDevToolsScreenshot': True,
    'args': ['--start-maximized', '--disable-infobars']
  }
}
driver = webdriver.Chrome(executable_path='./chromedriver_2.38.exe', desired_capabilities=capabilities)

driver.get("""https://www.amazon.com/gp/goldbox/ref=gbps_ftr_s-4_bedf_page_10? 
    gb_f_deals1=enforcedCategories:2972638011,
    dealStates:AVAILABLE%252CWAITLIST%252CWAITLISTFULL,
    includedAccessTypes:,page:10,sortOrder:BY_SCORE,
    dealsPerPage:32&pf_rd_p=afc45143-5c9c-4b30-8d5c-d838e760bedf&
    pf_rd_s=slot-4&pf_rd_t=701&pf_rd_i=gb_main&
    pf_rd_m=ATVPDKIKX0DER&pf_rd_r=ZDV4YBQJFDVR3PAY4ZBS&ie=UTF8""")

time.sleep(15)

golds = driver.find_elements_by_css_selector(".widgetContainer #widgetContent > div.singleCell")
print("found %d golds" % len(golds))  

template = """\
    <tr class="border">
        <td class="image"><img src="{0}"></td>\
        <td class="title"><a href="{1}">{2}</a></td>\
        <td class="price">{3}</td>
    </tr>"""

lines = []

for gold in golds:
    goldInfo = {}

    goldInfo['title'] = gold.find_element_by_css_selector('#dealTitle > span').text
    goldInfo['link'] = gold.find_element_by_css_selector('#dealTitle').get_attribute('href')
    goldInfo['image'] = gold.find_element_by_css_selector('#dealImage img').get_attribute('src')

    try:
        goldInfo['price'] = gold.find_element_by_css_selector('.priceBlock > span').text
    except NoSuchElementException:
        goldInfo['price'] = 'No price display'

    print goldInfo['title']

    line = template.format(goldInfo['image'], goldInfo['link'], goldInfo['title'], goldInfo['price'])
    lines.append(line)

html = """\
    <html>
        <body>
            <table>
                {0}
            </table>
        </body>
    </html>\
"""

f = open('./result.html', 'w')
f.write(html.format('\n'.join(lines)))
f.close()

答案 2 :(得分:0)

,您也可以跳过价格。我正在为您提供另一种方法,您可以按如下方式创建可用价格列表和相应的图像

  • 代码块:

    from selenium import webdriver
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_argument('disable-infobars')
    browser=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    browser.get("https://www.amazon.com/gp/goldbox/ref=gbps_ftr_s-4_bedf_page_10?gb_f_deals1=enforcedCategories:2972638011,dealStates:AVAILABLE%252CWAITLIST%252CWAITLISTFULL,includedAccessTypes:,page:10,sortOrder:BY_SCORE,dealsPerPage:32&pf_rd_p=afc45143-5c9c-4b30-8d5c-d838e760bedf&pf_rd_s=slot-4&pf_rd_t=701&pf_rd_i=gb_main&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=ZDV4YBQJFDVR3PAY4ZBS&ie=UTF8")
    #finds images
    deal_image = browser.find_elements_by_xpath("//div[@class='a-row priceBlock unitLineHeight']/span//preceding::img[1]")
    images = []
    for image in deal_image:
        images.append(image.get_attribute('src'))
    
    #finds prices
    deal_price = browser.find_elements_by_xpath("//div[@class='a-row priceBlock unitLineHeight']/span")
    prices = []
    for price in deal_price:
        prices.append(price.text)
    
    #print the information
    for image, price in zip(images, prices):
        print(image, price) 
    
  • 控制台输出:

    https://images-na.ssl-images-amazon.com/images/I/31zt-ovKJqL._AA210_.jpg $9.25
    https://images-na.ssl-images-amazon.com/images/I/610%2BKAfr72L._AA210_.jpg $15.89
    https://images-na.ssl-images-amazon.com/images/I/41whkQ1m0uL._AA210_.jpg $31.49
    https://images-na.ssl-images-amazon.com/images/I/41cAbUWEdoL._AA210_.jpg $259.58 - $782.99
    https://images-na.ssl-images-amazon.com/images/I/51raHLFC8wL._AA210_.jpg $139.56
    https://images-na.ssl-images-amazon.com/images/I/41fuZZwdruL._AA210_.jpg $41.24
    https://images-na.ssl-images-amazon.com/images/I/51N2rdMSh0L._AA210_.jpg $19.50 - $20.99
    https://images-na.ssl-images-amazon.com/images/I/515DbJhCtOL._AA210_.jpg $22.97
    https://images-na.ssl-images-amazon.com/images/I/51OzOZrj1rL._AA210_.jpg $109.95
    https://images-na.ssl-images-amazon.com/images/I/31-QDRkNbhL._AA210_.jpg $15.80
    https://images-na.ssl-images-amazon.com/images/I/41vXJ9fvcIL._AA210_.jpg $88.99
    https://images-na.ssl-images-amazon.com/images/I/51fKqo2YfcL._AA210_.jpg $21.85
    https://images-na.ssl-images-amazon.com/images/I/31GcGUXz9TL._AA210_.jpg $220.99 - $241.99
    https://images-na.ssl-images-amazon.com/images/I/41sROkWjnpL._AA210_.jpg $40.48
    https://images-na.ssl-images-amazon.com/images/I/51vXMFtZajL._AA210_.jpg $22.72
    https://images-na.ssl-images-amazon.com/images/I/512s5ZrjoFL._AA210_.jpg $51.99
    https://images-na.ssl-images-amazon.com/images/I/51A8Nfvf8eL._AA210_.jpg $8.30
    https://images-na.ssl-images-amazon.com/images/I/51aDac6YN5L._AA210_.jpg $18.53
    https://images-na.ssl-images-amazon.com/images/I/31SQON%2BiOBL._AA210_.jpg $10.07
    
  • 链接:

    https://www.amazon.com/gp/goldbox/ref=gbps_ftr_s-4_bedf_page_10?gb_f_deals1=enforcedCategories:2972638011,dealStates:AVAILABLE%252CWAITLIST%252CWAITLISTFULL,includedAccessTypes:,page:10,sortOrder:BY_SCORE,dealsPerPage:32&pf_rd_p=afc45143-5c9c-4b30-8d5c-d838e760bedf&pf_rd_s=slot-4&pf_rd_t=701&pf_rd_i=gb_main&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=ZDV4YBQJFDVR3PAY4ZBS&ie=UTF8
    
  • 浏览器快照:

price_picture