我正在尝试使用Selenium爬行网页,但是由于某些原因,我需要的元素没有显示在页面源中
我一直尝试使用WebDriverWait,直到页面加载为止。我还尝试查看数据是否位于我需要切换到的其他帧中。
driver.get('https://foreclosures.cabarruscounty.us/')
try:
WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.XPATH,'//*[@id="app"]/div[5]/div/div')))
print("Page is ready!")
web_url = driver.page_source
print(web_url)
except TimeoutException:
print("Loading took too much time!")
我希望可以看到每个个人财产卡的所有记录,然后提取。但是,页面源不显示任何这些数据。
如果我手动加载网页并检查源,则数据不存在view-source:https://foreclosures.cabarruscounty.us/
答案 0 :(得分:1)
尝试下面的代码。它将返回所有元素。使用visibility_of_all_elements_located
()
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
driver=webdriver.Chrome()
driver.get("https://foreclosures.cabarruscounty.us/")
elements=WebDriverWait(driver,30).until(EC.visibility_of_all_elements_located((By.XPATH,"//div[@id='app']//div[@class='card-body']/div[1]")))
allrecord=[ele.text for ele in elements]
print(allrecord) #it will give you all record.
如果仅打印第一个元素值。
print(allrecord[0].splitlines())
您将获得以下输出:
['Real ID: 04-086 -0040.00', 'Status: SALE SCHEDULED', 'Case Number: 18-CVD-2804', 'Tax Value: $29,660', 'Min Bid: $10,067', 'Sale Date: 10/03/2019', 'Sale Time: 12:00 PM', 'Owner: DOUGLAS JAMES W', 'Attorney: ZACCHAEUS LEGAL SVCS']
答案 1 :(得分:1)
要提取第一个 Real ID ,案例编号和所有者字段,您必须为 WebDriverWait visibility_of_element_located()
,您可以使用以下Locator Strategies:
代码块:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("start-maximized")
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=chrome_options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://foreclosures.cabarruscounty.us/");
Real_ID = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='row']//div[@class='card cardClass']/img//following::div[@class='card-body']//div/b"))).text
Case_Number = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='row']//div[@class='card cardClass']/img//following::div[@class='card-body']//div//following-sibling::b[2]"))).text
Owner = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='row']//div[@class='card cardClass']/img//following::div[@class='card-body']//div//following-sibling::b[7]"))).text
print("{} is {} owned by {}".format(Real_ID,Case_Number,Owner))
driver.quit()
控制台输出:
Real ID: 04-086 -0040.00 is Case Number: 18-CVD-2804 owned by Owner: DOUGLAS JAMES W
答案 2 :(得分:0)
您可以使用ImplicitWait和PageLoad来等待元素:
//For 30 seconds
driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(30);
driver.Manage().Timeouts().PageLoad = TimeSpan.FromSeconds(30);
此代码适用于C#和Selenium