使用Selenium从列表中获取没有ID或类别的文本

时间:2018-11-13 12:20:03

标签: python python-3.x selenium selenium-webdriver web-scraping

我不明白为什么当我肯定使用正确的Xpath时,我试图从中提取文本的列表返回空白。这是我的代码:

driver = webdriver.Firefox()
driver.get("https://www.omegawatches.com/watch-omega-specialities-first-omega-wrist-chronograph-51652483004001")
betweenLugs = driver.find_elements(By.XPATH, "/html/body/div[2]/main/div[3]/div/div/div[2]/div/div[2]/div[3]/div/ul/li[1]")])
print(betweenLugs.text)

这应该获取第一个列表项和度量值

Between lugs: 20 mm 

我也尝试了其他方法,但是Xpath没捡到这个事实告诉我出了点问题,无论我怎么做都无济于事,我将无法提取列表中的文字。有人知道我在做什么错吗?这是我第一次遇到这个问题。

5 个答案:

答案 0 :(得分:3)

xpath是错误的。它在/div[2]中失败,与任何内容都不匹配。这是一个为什么不应该使用绝对路径的示例。

该部分具有id属性,请使用它

betweenLugs = driver.find_elements(By.XPATH, "//*[@id='product-info-data-5bea7fa7406d7']/ul/li[1]")[0]

您可能还需要添加一些等待加载的时间

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions

betweenLugs = WebDriverWait(driver, 10).until(expected_conditions.visibility_of_element_located((By.XPATH, "//*[@id='product-info-data-5bea7fa7406d7']/ul/li[1]")))

答案 1 :(得分:2)

该页面上已经装有jQuery,因此您可以:

driver.execute_script("return jQuery('li:contains(Between lugs)').text().trim().replace(/\s+/g, ' ')")

您可以在chrome选择器中摆弄选择器,这使操作变得更加容易。

答案 2 :(得分:1)

好的,尝试一下,看看是否能解决问题:

between_lugs = driver.find_element_by_xpath("//*[contains(text(), 'Between lugs')]").get_attribute("innerHTML")
between_lugs_value = driver.find_element_by_xpath("//*[contains(text(), 'Between lugs')]/../span").get_attribute("innerHTML")

final_text = between_lugs + " " + between_lugs_value

答案 3 :(得分:1)

另一种更简单的方法可能是以下方法:

import matplotlib.pyplot as plt 
import pandas as pd
import os
import matplotlib.dates as mdates 

# Read the file in csv 
File = pd.read_csv("Timeline.csv") 

# Where to save the output
outputDirectory = 'Z:\\15_Hawaii\\Plotting\\'
if not os.path.exists(outputDirectory):
    os.makedirs(outputDirectory)

# Datetime selection
time = File.iloc[:,0] 
time_time = pd.to_datetime(time, format = '%m/%d/%Y')
time_time = pd.to_datetime(time_time, format = '%m/%d/%Y')
time_day = mdates.DayLocator()

# Kona data selection
Kona = File.iloc[:,2]

# defining the names which will be called
fig, host = plt.subplots()
ax = plt.gca()

# simple plot of the data
K_plot, = host.plot(time_time, Kona, color=[0,0.690196078,0.941176471], linewidth=1, label="Kona")

# attempt to scatter plot the data
K_plot, = plt.scatter(time_time, Kona, color=[0,0.690196078,0.941176471], linewidth=1, label="Kona")

# other plotting parameters
ax.xaxis.grid(linestyle='dotted')
plt.setp(ax.xaxis.get_majorticklabels(), rotation=80 )
fig.set_size_inches(12, 5)

plt.savefig(outputDirectory + 'SO2_PLOT_1' + '.png', bbox_inches='tight',     dpi=300, pad_inches=0.0) 

输出:

from contextlib import closing
from selenium import webdriver
from selenium.webdriver.support import ui

url = "https://www.omegawatches.com/watch-omega-specialities-first-omega-wrist-chronograph-51652483004001"

with closing(webdriver.Chrome()) as wd:
    wait = ui.WebDriverWait(wd, 10)
    wd.get(url)
    item = wait.until(lambda wd: wd.find_element_by_xpath("//*[contains(@class,'technical-data')]//li")).get_attribute('textContent')
    print(' '.join(item.split()))

答案 4 :(得分:0)

使用向下滚动并通过css选择器等待以定位父li

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions

driver = webdriver.Chrome() #Firefox()
driver.get("https://www.omegawatches.com/watch-omega-specialities-first-omega-wrist-chronograph-51652483004001")
driver.execute_script("window.scrollTo(0, 2000)") 
betweenLugs = WebDriverWait(driver, 10).until(expected_conditions.visibility_of_element_located((By.CSS_SELECTOR, "#product-info-data-5beaf5497d916 > ul > li:nth-child(1)")))

print(betweenLugs.text)