Selenium WebDriver-无法通过xpath获取amazon.co.uk的所有src图像

时间:2019-03-25 13:36:37

标签: python python-2.7 selenium xpath web-scraping

我正在尝试获取该产品中存在的图像的所有链接-https://www.amazon.co.uk/Autoglym-AG-035001-Interior-Shampoo/dp/B00114WOBC/ref=sr_1_1?ie=UTF8&qid=1553519250&sr=8-1&keywords=715933155337

但是作为回报,我仅获得图像的一个URL链接。

当我尝试找到列表的长度(product_image_url2)时,甚至没有获得6个Webelements

          product_image_url2 = self.browser.find_elements_by_xpath('//*[@id="main-image- 

               container"]/ul/li/span/span/div/img')

            product_image_url2_count = len(product_image_url2)

             print product_image_url2_count

              image_url2 = []
                for curr_product_image_url2 in product_image_url2:
    image_url2.append(curr_product_image_url2.get_attribute("src"))
                product_dict['image url2']=image_url2

3 个答案:

答案 0 :(得分:0)

这是获取左侧img元素的正确xpath。

//li[@class='a-spacing-small item imageThumbnail a-declarative']//img

下面的代码和输出:

    wait.until(EC.presence_of_element_located((By.XPATH, "//li[@class='a-spacing-small item imageThumbnail a-declarative']//img")))
product_image_url2 = driver.find_elements_by_xpath("//li[@class='a-spacing-small item imageThumbnail a-declarative']//img")

product_image_url2_count = len(product_image_url2)

print(product_image_url2_count)

image_url2 = []
for curr_product_image_url2 in product_image_url2:
    print(curr_product_image_url2.get_attribute("src"))
    image_url2.append(curr_product_image_url2.get_attribute("src"))

输出:     6     https://images-na.ssl-images-amazon.com/images/I/31JLKXyjA5L.SS40.jpg     https://images-na.ssl-images-amazon.com/images/I/51ZZMf1JVfL.SS40.jpg     https://images-na.ssl-images-amazon.com/images/I/416%2BBQU%2BtuL.SS40.jpg     https://images-na.ssl-images-amazon.com/images/I/41CdeeG0HGL.SS40.jpg     https://images-na.ssl-images-amazon.com/images/I/41bZb0qgNPL.SS40.jpg     https://images-na.ssl-images-amazon.com/images/I/219h80ACoQL.SS40.jpg

答案 1 :(得分:0)

您的xpath错误。尝试遵循xpath。

while(remainingToRead > 0){
    int nb = fread(buffer, 1, bufferSize);
    decompress(buffer, nb, bufferOut);
    nb -= remainingToRead
}

如果要存储在字典中,请尝试以下代码。

from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.amazon.co.uk/Autoglym-AG-035001-Interior-Shampoo/dp/B00114WOBC/ref=sr_1_1?ie=UTF8&qid=1553519250&sr=8-1&keywords=715933155337')

product_image_url2 = driver.find_elements_by_xpath('//span[@id="a-autoid-8-announce"]/img')

product_image_url2_count = len(product_image_url2)

print (product_image_url2_count)

image_url2 = []
for curr_product_image_url2 in product_image_url2:
    image_url2.append(curr_product_image_url2.get_attribute("src"))

print(image_url2)

输出:

product_dict={}
for i in range(len(image_url2)):
    product_dict[i]=image_url2[i]

print(product_dict)

答案 2 :(得分:0)

与之匹配的CSS选择

tableView.estimatedSectionHeaderHeight

#altImages img:not([alt]) 是一个ID选择器。 #是类型选择器(用于标签)。两者之间的imgspace,这意味着descendant combinator是ID为img的元素的子元素。 altImages指定子:not([alt])不具有img属性。 alt是属性选择器,[]是CSS :not。了解它们here

代码:

pseudo-class