Question

我有一个问题是使用Selenium，特别是xpath来提取类中的第二个元素。看图像（抱歉没有通过编码报告HTML但是它会太长）我会提取索引的第二项＆＃34;字段内容＆＃34;，关于日期（2018年6月4日）。但是，还有另一个对象之前也用＆＃34;字段内容＆＃34; ：欧洲的文化与认同。因此，我在提取日期时遇到了一些问题，并将其放入数据框中，就像我在我的代码中尝试做的那样。 enter image description here

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import pandas as pd

browser = webdriver.Chrome(executable_path=r'C:xxx', chrome_options=option)
browser.get(url)
url= "https://www.mooc-list.com/countries/italy"
titles_element = browser.find_elements_by_xpath("//div[starts-with(@class, 'views-row views-row-')]")
titles = [x.text for x in titles_element]
for i in titles_element:
    newtitle= i.find_elements_by_xpath("//div[@class='views-field views-field-title']")
moocstitle = [x.text for x in newtitle]
for i in titles_element:
    area= i.find_elements_by_xpath("//span[@class='field-content']")
areas = [x.text for x in area]
moocs = pd.DataFrame({'moocs': moocstitle,

                          'areas': areas
                       })

moocs.head(10)

Answer 1

如果我正确观察，包含日期文本的类名列表是唯一的。可以通过两种方式选择日期文本：

在多个类名称中选择包含的类名称：

#if class name has spaces on both sides 
//div[contains(concat(' ', normalize-space(@class), ' '), ' test-class')]/span

#if class name has spaces on left side only(your example)
//div[contains(concat(' ', normalize-space(@class)), ' views-field-field-start-date-text')]/span

选择特定班级标签

的所有班级名称
//div[@class='views-field views-field-field-start-date-text')]/span

Answer 2

你可以尝试这个Xpath：

//a[contains(text(),'Cultures and Identities')]/ancestor::div[contains(@class,'field-title')]/following-sibling::div[contains(@class,'start')]/span

Answer 3

没有专门针对selenium尝试这个（我使用lxml但xpath应该是相同的），我想我们可以远远地改变你的xpath。例如，这个xpath可以获取所有日期字符串：

'//div[contains(@class, "views-field-field-start-date-text")]/span'

然后你可以索引到：

result[2].text

Selenium PY - xpath - 循环不工作

3 个答案: