我正试图抓住以下网站:
finsight.com/product/us/abs/ee。
特别是,对于每一行,我试图提取类型(AUTO或CBMS),公司名称,并下载链接。以下是每行1的源代码摘录。但是,当我运行循环时,我只获得第一行的名称和链接(在这种情况下是AUTo CarMax Auto Owner Trust 2018-2)。
到目前为止,我有以下代码:
import selenium
import time
import requests
from selenium import webdriver
url = "https://finsight.com/product/us/abs/ee"
driver = webdriver.Chrome()
driver.get(url)
time.sleep(1)
company_row = driver.find_elements_by_xpath("//div[@class='ee-item portlet box ng-scope']")
for row in company_row:
RD_element = row.find_element_by_xpath("//a[@class='related-document ng-scope']")
company_name = row.find_element_by_xpath("//span[contains(@class,'filing-left filing-issuer ng-binding')]")
company_type = row.find_element_by_xpath("//span[contains(@class,'filing-left filing-sector ng-binding')]")
RD_link = RD_element.get_attribute('href')
print (company_name.text)
print (company_type.text)
print (RD_link)
我的代码输出如下:
DevTools listening on ws://127.0.0.1:12060/devtools/browser/c5d13168-0976-41c7-937c-ff2bd4cd99fe
CarMax Auto Owner Trust 2018-2
AUTO
https://finsight.com/api/download-csv?file_id=15395
CarMax Auto Owner Trust 2018-2
AUTO
https://finsight.com/api/download-csv?file_id=15395
CarMax Auto Owner Trust 2018-2
AUTO
https://finsight.com/api/download-csv?file_id=15395
CarMax Auto Owner Trust 2018-2
AUTO
https://finsight.com/api/download-csv?file_id=15395
CarMax Auto Owner Trust 2018-2
答案 0 :(得分:0)
以下是您的案例的工作代码:
program
输出:
from selenium import webdriver
from selenium.webdriver.support import ui
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://finsight.com/product/us/abs/ee")
company_rows = ui.WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".ee-item.portlet.box.ng-scope")))
for row in company_rows:
RD_element = row.find_element_by_css_selector(".related-document.ng-scope")
RD_link = RD_element.get_attribute("href")
company_name = row.find_element_by_css_selector(".filing-left.filing-issuer.ng-binding")
company_type = row.find_element_by_css_selector(".filing-left.filing-sector.ng-binding")
print(company_name.text)
print(company_type.text)
print(RD_link)
PS:我在这里使用CSS选择器而不是XPath。