从网页上的锚标签访问详细信息

时间:2019-10-12 07:05:17

标签: python selenium csv web-scraping

我正在抓取一个网页,我已经设法使用硒将表中的数据提取到一个csv文件中。我正在努力的是从表的每一行上存在的定位标记中获取信息。

我尝试单击表的所有锚标记,以从相应的URL获取信息,但是在单击第一个URL后便停止了。它给出一个错误消息:消息:陈旧元素引用:元素未附加到页面文档。 我不确定这是否是解决此问题的正确方法。 这是到目前为止我尝试过的代码。如果代码格式不正确,很抱歉,我是python和stackoverflow的新手。

 import csv
 import requests
 import time
 from selenium import webdriver
 from selenium.webdriver.common.by import By
 from selenium.webdriver.support.ui import WebDriverWait
 from selenium.webdriver.support import expected_conditions as EC

 browser = webdriver.Chrome(executable_path=r"D:\jewel\chromedriver.exe")
 browser.get(('https://e-sourcingni.bravosolution.co.uk/web/login.shtml'))
 signInButton = browser.find_element_by_css_selector(".only")
 signInButton.click()
 time.sleep(5)
 table = browser.find_element_by_css_selector(".list-table")

 for a in browser.find_elements_by_css_selector(".detailLink"):
  a.click()
  time.sleep(2)
  browser.execute_script("window.history.go(-1)")
  time.sleep(2)

 with open('output.csv', "w") as f:
   writer = csv.writer(f)
   writer.writerow(["S.No","Status","Organization","Project Title","First Publishing Date","Work Category","Listing Deadline"])
  for row in table.find_elements_by_css_selector('tr'):
    writer.writerow([d.text for d in row.find_elements_by_css_selector('td')])


 browser.close()

我需要从具有类detailLink的标签的href中获取数据。我无法采取适当的方法来执行此操作。

2 个答案:

答案 0 :(得分:1)

我使用正常的for循环来迭代表,而不是每个循环。试试这个,让我知道如何进行。

import csv
import time
from selenium import webdriver

browser = webdriver.Chrome('/usr/local/bin/chromedriver')  # Optional argument, if not specified will search path.
browser.implicitly_wait(5)

browser.execute_script("window.open('about:blank','tab1');")
browser.switch_to.window("tab1")
browser.get(('https://e-sourcingni.bravosolution.co.uk/web/login.shtml'))
signInButton = browser.find_element_by_css_selector(".only")
signInButton.click()
time.sleep(5)
table = browser.find_element_by_css_selector(".list-table")
links=browser.find_elements_by_css_selector(".detailLink")
for i in range(len(links)):
    links=browser.find_elements_by_css_selector(".detailLink") 
    links[i].click()
    time.sleep(2)
    browser.execute_script("window.history.go(-1)")
    time.sleep(2)

with open('output.csv', "w") as f:
    writer = csv.writer(f)
    writer.writerow(["S.No","Status","Organization","Project Title","First Publishing Date","Work Category","Listing Deadline"])
    table=browser.find_elements_by_xpath("//table[@class='list-table']//tr")
    for row in range(len(table)):
        x=[]
        for d in browser.find_elements_by_xpath("//table[@class='list-table']//tr["+str(row)+"]//td"):
            x.append(d.text.encode('utf-8'))
        writer.writerow(x)


browser.close()

答案 1 :(得分:0)

是的,自从您移动到下一页以来,由于您更改了页面,因此无法在上一页中找到该元素。 你可以试试这个

SELECT cp.id    AS case_pointerID, 
       ct.id    AS caseTypeID, 
       ct.ct_id AS casetypeCode, 
       ct.cc_id AS cc_id, 
       ct.name  AS caseTypeName, 
       co.wid   AS complexCode, 
       co.id    AS complex_id, 
       co.name  AS courtName, 
       ci.id    AS city_id, 
       ci.wid   AS cityCode, 
       ci.name  AS cityName, 
       st.id    AS state_id, 
       st.name  AS stateName, 
       st.wid   AS stateCode 
FROM   case_types ct 
       LEFT JOIN court_complex co 
              ON ct.cc_id = co.id 
       LEFT JOIN city ci 
              ON co.city_id = ci.id 
       LEFT JOIN state st 
              ON ci.state_id = st.id 
       LEFT JOIN case_pointer cp 
              ON ct.id = cp.casetypeid 
WHERE NOT EXISTS  
        (SELECT 1 
        FROM case_pointer cp 
        WHERE EXISTS (SELECT null 
                      FROM skippers sk 
                      WHERE sk.state_id = cp.state_id)
         ) 
       OR cp.id IS NULL 
       ORDER BY st.id ASC, ct.id ASC, co.id ASC
       LIMIT 5