如何使用python和Selenium Webdriver遍历表并打印前10行的结果?

时间:2019-02-21 01:33:09

标签: python selenium selenium-webdriver xpath css-selectors

使用Selenium Webdriver和python,我能够找到搜索单元并搜索以返回结果,但是我想从返回的前10行(减去标题行)中打印结果。

我正在使用的网站是:http://www.hoovers.com/company-information/company-search.html?term=simon例如作为搜索词。

我已经搜索了一段时间,并尝试了很多东西,包括xpaths和大多数错误。这是我到目前为止最接近的内容:

for row in mydriver.find_elements_by_class_name('cmp-company-directory'):
        cell = row.find_elements_by_tag_name("td")[0]
        print(cell.text)

但是,它仅返回第一行,并且不会遍历表。有小费吗? TIA!

2 个答案:

答案 0 :(得分:0)

在Xpath下尝试此操作,它将遍历表格并打印前10行。

elements=driver.find_elements_by_xpath("//div[@class='clear data-table sortable-header dashed-table-tr alternate-rows']//tr/td")
counter=1
for element in elements:
    print(element.text)
    counter+=1
    if counter==50:
        break

输出:

Simon Property Group, Inc.
Indianapolis, IN, United States
$5538.64M
See Details

SIMON & SCHUSTER (UK) LIMITED
London, London, England
$60.39M
See Details

SIMON JERSEY GROUP LIMITED
Accrington, Lancashire, England

See Details

Simon Worldwide, Inc.
Irvine, CA, United States
$0.0M
See Details

Simon Property Group, L.P.
Indianapolis, IN, United States
$5538.64M
See Details

Günter Simon e.K. Inh. Carmen Simon
Ravensburg, Baden-Württemberg, Germany

See Details

Simon e Simon Servicos Odontologicos Ltda
Vere, Parana, Brazil

See Details

Simon Comercial e Industrial Ltda Em Recuperacao Judicial
Aparecida De Goiania, Goias, Brazil

See Details

Simon Levelt B.V.
Haarlem, Noord-Holland, The Netherlands

See Details

SIMON SAU
Barcelona, Barcelona, Spain
$115.95M
See Details

如果只想打印公司名称的前10行,请尝试此操作。

elements=driver.find_elements_by_xpath("//div[@class='clear data-table sortable-header dashed-table-tr alternate-rows']//tr/td[@class='company_name']")
counter=0
for element in elements:
    print(element.text)
    counter+=1
    if counter==10:
        break

输出:-

Simon Property Group, Inc.
SIMON & SCHUSTER (UK) LIMITED
SIMON JERSEY GROUP LIMITED
Simon Worldwide, Inc.
Simon Property Group, L.P.
Günter Simon e.K. Inh. Carmen Simon
Simon e Simon Servicos Odontologicos Ltda
Simon Comercial e Industrial Ltda Em Recuperacao Judicial
Simon Levelt B.V.

让我知道这是否对您有用。

答案 1 :(得分:0)

要打印公司名称(不包括标题行),您必须为flag.StringVar(&cmdSt.configPtr, "c", "configfile", "configure file to parse ") flag.StringVar(&cmdSt.interfacePtr, "i", "interface", "capture network interface") flag.Parse() // cmdSt.configPtr and cmdSt.interfacePtr are now set to // command flag value or default if the flag was // not specified. 引入 WebDriverWait ,并且可以使用以下解决方案之一:

  • visibility_of_all_elements_located

    CSS_SELECTOR
  • print([company_name.get_attribute("innerHTML") for company_name in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.cmp-company-directory table td.company_name>a")))])

    XPATH

要打印前10个公司名称(不包括标题行),您必须为print([company_name.get_attribute("innerHTML") for company_name in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='cmp-company-directory']//table//td[@class='company_name']/a")))]) 引入 WebDriverWait ,然后必须使用 { {1}} 将列表限制为 10 个元素,您可以使用以下任一解决方案:

  • visibility_of_all_elements_located

    [:10]
  • CSS_SELECTOR

    print([company_name.text for company_name in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.cmp-company-directory table td.company_name>a")))[:10]])
    

注意:您必须添加以下导入:

XPATH