我已经在python中编写了一个与selenium结合使用的脚本来解析网页中的某个链接。该链接位于iframe
之内。我已尝试切换到它,但无法从中读取内容以获取我之后的特定链接。
以下是如何到达目的地:
登录的链接是免费的。
登录后,网站会自动转到所需内容的第一页。
那里有几个名字(成员),其链接连接到他们的每个个人资料。
进入该个人资料页面后,有一个指向他们现有公司的链接(位于专业经验下),这是我想要解析的内容。
第一个个人资料中所需的链接(在专业经验下)看起来like this:
这是我迄今为止尝试过的脚本:
from selenium import webdriver
from urllib.parse import urljoin
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "https://www.xing.com"
driver = webdriver.Chrome()
driver.get("replace with above link")
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#login_form_username"))).send_keys("user")
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#login_form_password"))).send_keys("pass",Keys.RETURN)
links = [urljoin(link,items.find_element_by_css_selector(".user-name").get_attribute("href")) for items in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".contact")))]
for link in links:
driver.get(link)
name = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "h2 span"))).text
wait.until(EC.frame_to_be_available_and_switch_to_it(driver.find_element_by_css_selector("#tab-content")))
#I get timeout exception in the following line
link = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, ".job-company-name a"))).text
print(name,link)
我不知道这是否有用。无论如何,link to the source
答案 0 :(得分:0)
我似乎找到了解决问题的解决方案。我准备好了解答案如果出现更好的解决方案:
links = [urljoin(link,items.find_element_by_css_selector(".user-name").get_attribute("href")) for items in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".contact")))]
for link in links:
driver.get(link)
name = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "h2 span"))).text
ilink = driver.find_element_by_css_selector("#tab-content").get_attribute("src")
driver.get(ilink) #this is what I did to get around that
try:
link = driver.find_element_by_css_selector(".job-company-name a").text
except Exception: link = ""
print(name,link)
我没有切换到iframe,只是解析了iframe中的链接并使用了它。这不是我预期的解决方案,但它有效。