识别表格行和表格数据-CSS选择器Python

时间:2018-09-20 09:51:04

标签: python selenium html-table css-selectors

我有多种情况下要从中提取数据的表行:

案例1

 Onsite Service After Remote Diagnosis  April 19, 2014  April 19, 2017

案例2

CAR                                     October 15, 2016    October 15, 2017    
Onsite Service After Remote Diagnosis   October 15, 2016    October 15, 2019

案例3

NBD ProSupport                          July 16, 2008   July 15, 2011   
Onsite Service After Remote Diagnosis   July 16, 2008   July 15, 2011

我需要提取的信息在第二个td上包含“远程诊断后的现场服务”的行上,每种情况下,该行的日期都将在该行的右侧

预期输出:

                      April 19, 2017
                    October 15, 2017
                       July 15, 2011

我的代码:

from selenium import webdriver
import time
from openpyxl import load_workbook

driver = webdriver.Chrome()


def scrape(codes):
    dates = []
    for i in range(len(codes)):
        driver.get("https://www.dell.com/support/home/us/en/19/product-support/"
                   "servicetag/%s/warranty?ref=captchasuccess" % codes[i])

    # Solve captcha manually
        if i == 0:
            print("You now have 120\" seconds to solve the captcha")
            time.sleep(120)
            print("120\" Passed")
    # Extract data
        expdate = driver.find_element_by_css_selector("#printdivid > div > div.not-annotated.hover > table:nth-child(3) > tbody > tr > td:nth-child(3)")
        print(expdate.get_attribute('innerText'))
    driver.close()

codes = ['159DT3J', '15FDBG2', '10V8YZ1']
scrape(codes)

我的输出:

April 19, 2014
October 15, 2016
July 16, 2008

取自出现的第一行和第一td 我曾尝试更改tbody > tr > td:nth-child(3),但根据文本进行识别会更好,并且可以避免错误。

1 个答案:

答案 0 :(得分:1)

由于您需要为“远程诊断后的现场服务”提取文本,因此建议您使用以下内容更新用于查找元素的行:

expdate = driver.find_element_by_xpath("//td[text()='Onsite Service After Remote Diagnosis']/following-sibling::td")

在这里,我们正在使用xpath定位器,并在文本'远程诊断后的现场服务'

中寻找td