我正在尝试从亚马逊搜索页面抓取一些基本信息。我使用的 XPath 似乎是正确的,但下面的代码只给我 for 循环每次迭代的第一个结果 - 基本上只有第一本书的标题 x 第 1 页上的搜索结果数。我是什么做错了吗?
from selenium import webdriver
from time import sleep
PATH = 'ChromeDriver/chromedriver'
driver = webdriver.Chrome(PATH)
driver.get('https://www.amazon.in/s?k=python+books&ref=nb_sb_noss')
sleep(2)
entries = driver.find_elements_by_xpath('//div[contains(@data-cel-widget, "search_result_")]')
for entry in entries:
title = entry.find_element_by_xpath('//span[@class = "a-size-medium a-color-base a-text-normal"]')
print(title.text)
答案 0 :(得分:1)
不需要 record_path
定位器。直接循环结果
entries
打印:
for entry in driver.find_elements_by_xpath("//span[@class = 'a-size-medium a-color-base a-text-normal']"):
print(entry.text)
更新的解决方案
这里有一种方法可以解析它并将变量名称分配给不同的部分。请注意,作者和日期实际上在同一个元素中,因此它同时显示..
Learning with Python
Machine Learning using Python
Python Crash Course, 2nd Edition: A Hands-On, Project-Based Introduction to Programming
Python: This Book Includes: Learn Python Programming + Python Coding and Programming + Python Coding. Everything you need to know to Learn Coding ... Machine Learning, Data Science and more ....
Python Programming: Using Problem Solving Approach
Automate the Boring Stuff with Python, 2nd Edition: Practical Programming for Total Beginners
...
印刷品:
for entry in driver.find_elements_by_xpath("//div[@data-component-type='s-search-result']"):
title = entry.find_element_by_xpath(".//span[@class = 'a-size-medium a-color-base a-text-normal']").text
authors = entry.find_element_by_xpath(".//div[@class='a-row a-size-base a-color-secondary']").get_attribute("innerText")
print(title)
print(authors)
还要注意,在循环中的每个子元素中,它都以 Learning with Python
by Allen Downey , Jeffrey Elkner, et al. | 1 January 2015
Machine Learning using Python
by U Dinesh Kumar Manaranjan Pradhan | 1 January 2019
...
开头。点是必需的,否则每次都会回到根,我认为这就是您最初面临的问题。
答案 1 :(得分:1)
这个怎么样?我从每个条目中抓取了文本。为了便于阅读,我还用逗号替换了所有换行符。
subset(df1, col1 >col2 - df1_sd[1] & col1< col2 + df1_sd[2])
col1 col2
1 5.37 4.06
2 2.86 4.50
3 2.72 3.90
4 4.62 5.62
5 5.76 4.65
7 1.35 -0.31
答案 2 :(得分:0)
您为条目使用了错误的定位器。
使用这个://div[@data-component-type='s-search-result' and (not(contains(@class,'AdHolder')))]
所以entries = driver.find_elements_by_xpath('//div[@data-component-type='s-search-result' and (not(contains(@class,'AdHolder')))]')
有了这个定位器,剩下的就对了。