Question

我对抓取比较陌生，遇到了一个复杂的站点 (https://adviserinfo.sec.gov/firm/summary/104518)，我无法弄清楚如何使用 Selenium 跟踪链接（该链接称为“按部分查看表单 ADV”）。我已经在别处找到了我现在想要的数据，但很想知道如何完成。

通常会有一个 href 标签，将鼠标悬停在菜单上会显示目标网址，但这并不明显。我已经尝试将 xpath 用于 div / li / span，但得到该元素不可交互的错误。

def get_details(url):
     chrome_options = webdriver.ChromeOptions()
     prefs = {"profile.default_content_setting_values.notifications" : 2}
     chrome_options.add_experimental_option("prefs",prefs)
     chrome_options.add_argument("--headless")
     driver = webdriver.Chrome(ChromeDriverManager().install(),options=chrome_options)
     #driver = webdriver.Chrome('./chromedriver',options=chrome_options)
     driver.get(url)
     print(driver.title)
     time.sleep(5)
     print(driver.current_url)
     html = driver.page_source
     soup = BeautifulSoup(html)
     for tag in soup.findAll("li", {"analytics-label": "View Form ADV By Section"}):
          print(tag)
     driver.find_element_by_xpath('/html/body/div[1]/div/div/div[1]/div/div/ul/div[6]/li').click()
     time.sleep(5)
     print(driver.current_url)
     driver.quit()

如果您手动单击该链接，则会打开一个新选项卡，其中 url 看起来是基于 ORG PK 和 FLNG PK (https://files.adviserinfo.sec.gov/IAPD/content/viewform/adv/sections/iapd_AdvIdentifyingInfoSection.aspx?ORG_PK=104518&FLNG_PK=024FFC7A000801B405611B1102CCD535056C8CC0) 的组合。我以为我可以构建 url 但找不到 FLNG PK 来自哪里...

我的问题是：

网站如何在没有任何类型的 href 标签的情况下生成这些链接（假设脚本以某种方式执行此操作...）？
有没有办法让 Selenium 访问链接？
如果没有，是否可以通过其他方式找出/构建链接？

非常感谢

使用 selenium 抓取非 href 链接

0 个答案: