Question

我正在尝试为以下网址下载完整生成的html源文件：http://www.morningstar.com/funds/xnas/vinix/quote.html

我特别感兴趣的是在标题＆＃34; Performance VINIX＆＃34;中的表格中提取生成的数字数据，例如，行和＃34;增长在10,000＆＃34;。我尝试过this popular answer中概述的方法。但是保存的文本html文件看起来就像预先生成的原始源文件一样，包含所有javascript，没有生成的内容。例如，当我贪图“生长”这个词的时候。我一无所获。

我还经历了chrome web devtools中的DOM结构，以识别包含此表的最内层元素，其xpath为/ html / body，并使用find_element_by_xpath技术隔离元素，然后保存以下字符串对象：

content = browser.find_element_by_xpath('/html/body').text

仍然没有用。知道为什么吗？非常感谢！

Answer 1

如果你想获得已经生成的表，你需要等待一段时间，直到它出现在DOM。另请注意，它位于iframe内，因此您需要在搜索所需元素之前先切换到该帧

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait

wait(browser, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, '//iframe[starts-with(@id, "QT_IFRAME_")]')))
table = wait(browser, 20).until(EC.presence_of_element_located((By.ID, "idPerformanceContent")))

然后你可以抓取所需的数据：

for i in table.find_elements_by_xpath('.//tr[td="Growth of 10,000"]/td')[1:]:
    print(i.text)

无法使用python selenium生成html源代码

1 个答案: