我正在抓取一个网页,我已经设法使用硒将表中的数据提取到一个csv文件中。我正在努力的是从表的每一行上存在的定位标记中获取信息。
我尝试单击表的所有锚标记,以从相应的URL获取信息,但是在单击第一个URL后便停止了。它给出一个错误消息:消息:陈旧元素引用:元素未附加到页面文档。 我不确定这是否是解决此问题的正确方法。 这是到目前为止我尝试过的代码。如果代码格式不正确,很抱歉,我是python和stackoverflow的新手。
import csv
import requests
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Chrome(executable_path=r"D:\jewel\chromedriver.exe")
browser.get(('https://e-sourcingni.bravosolution.co.uk/web/login.shtml'))
signInButton = browser.find_element_by_css_selector(".only")
signInButton.click()
time.sleep(5)
table = browser.find_element_by_css_selector(".list-table")
for a in browser.find_elements_by_css_selector(".detailLink"):
a.click()
time.sleep(2)
browser.execute_script("window.history.go(-1)")
time.sleep(2)
with open('output.csv', "w") as f:
writer = csv.writer(f)
writer.writerow(["S.No","Status","Organization","Project Title","First Publishing Date","Work Category","Listing Deadline"])
for row in table.find_elements_by_css_selector('tr'):
writer.writerow([d.text for d in row.find_elements_by_css_selector('td')])
browser.close()
我需要从具有类detailLink的标签的href中获取数据。我无法采取适当的方法来执行此操作。
答案 0 :(得分:1)
我使用正常的for循环来迭代表,而不是每个循环。试试这个,让我知道如何进行。
import csv
import time
from selenium import webdriver
browser = webdriver.Chrome('/usr/local/bin/chromedriver') # Optional argument, if not specified will search path.
browser.implicitly_wait(5)
browser.execute_script("window.open('about:blank','tab1');")
browser.switch_to.window("tab1")
browser.get(('https://e-sourcingni.bravosolution.co.uk/web/login.shtml'))
signInButton = browser.find_element_by_css_selector(".only")
signInButton.click()
time.sleep(5)
table = browser.find_element_by_css_selector(".list-table")
links=browser.find_elements_by_css_selector(".detailLink")
for i in range(len(links)):
links=browser.find_elements_by_css_selector(".detailLink")
links[i].click()
time.sleep(2)
browser.execute_script("window.history.go(-1)")
time.sleep(2)
with open('output.csv', "w") as f:
writer = csv.writer(f)
writer.writerow(["S.No","Status","Organization","Project Title","First Publishing Date","Work Category","Listing Deadline"])
table=browser.find_elements_by_xpath("//table[@class='list-table']//tr")
for row in range(len(table)):
x=[]
for d in browser.find_elements_by_xpath("//table[@class='list-table']//tr["+str(row)+"]//td"):
x.append(d.text.encode('utf-8'))
writer.writerow(x)
browser.close()
答案 1 :(得分:0)
是的,自从您移动到下一页以来,由于您更改了页面,因此无法在上一页中找到该元素。 你可以试试这个
SELECT cp.id AS case_pointerID,
ct.id AS caseTypeID,
ct.ct_id AS casetypeCode,
ct.cc_id AS cc_id,
ct.name AS caseTypeName,
co.wid AS complexCode,
co.id AS complex_id,
co.name AS courtName,
ci.id AS city_id,
ci.wid AS cityCode,
ci.name AS cityName,
st.id AS state_id,
st.name AS stateName,
st.wid AS stateCode
FROM case_types ct
LEFT JOIN court_complex co
ON ct.cc_id = co.id
LEFT JOIN city ci
ON co.city_id = ci.id
LEFT JOIN state st
ON ci.state_id = st.id
LEFT JOIN case_pointer cp
ON ct.id = cp.casetypeid
WHERE NOT EXISTS
(SELECT 1
FROM case_pointer cp
WHERE EXISTS (SELECT null
FROM skippers sk
WHERE sk.state_id = cp.state_id)
)
OR cp.id IS NULL
ORDER BY st.id ASC, ct.id ASC, co.id ASC
LIMIT 5