我有多种情况下要从中提取数据的表行:
案例1
Onsite Service After Remote Diagnosis April 19, 2014 April 19, 2017
案例2
CAR October 15, 2016 October 15, 2017
Onsite Service After Remote Diagnosis October 15, 2016 October 15, 2019
案例3
NBD ProSupport July 16, 2008 July 15, 2011
Onsite Service After Remote Diagnosis July 16, 2008 July 15, 2011
我需要提取的信息在第二个td上包含“远程诊断后的现场服务”的行上,每种情况下,该行的日期都将在该行的右侧
预期输出:
April 19, 2017
October 15, 2017
July 15, 2011
我的代码:
from selenium import webdriver
import time
from openpyxl import load_workbook
driver = webdriver.Chrome()
def scrape(codes):
dates = []
for i in range(len(codes)):
driver.get("https://www.dell.com/support/home/us/en/19/product-support/"
"servicetag/%s/warranty?ref=captchasuccess" % codes[i])
# Solve captcha manually
if i == 0:
print("You now have 120\" seconds to solve the captcha")
time.sleep(120)
print("120\" Passed")
# Extract data
expdate = driver.find_element_by_css_selector("#printdivid > div > div.not-annotated.hover > table:nth-child(3) > tbody > tr > td:nth-child(3)")
print(expdate.get_attribute('innerText'))
driver.close()
codes = ['159DT3J', '15FDBG2', '10V8YZ1']
scrape(codes)
我的输出:
April 19, 2014
October 15, 2016
July 16, 2008
取自出现的第一行和第一td
我曾尝试更改tbody > tr > td:nth-child(3)
,但根据文本进行识别会更好,并且可以避免错误。
答案 0 :(得分:1)
由于您需要为“远程诊断后的现场服务”提取文本,因此建议您使用以下内容更新用于查找元素的行:
expdate = driver.find_element_by_xpath("//td[text()='Onsite Service After Remote Diagnosis']/following-sibling::td")
在这里,我们正在使用xpath定位器,并在文本'远程诊断后的现场服务'
中寻找td