Question

这是我的python代码：

import pandas as pd
import pandas_datareader.data as web
import bs4 as bs
import urllib.request as ul

from selenium import webdriver
style.use('ggplot')
driver = webdriver.PhantomJS(executable_path='C:\\Phantomjs\\bin\\phantomjs.exe')
def getBondRate():
    #driver.deleteAllCookies();
    url = "https://www.marketwatch.com/investing/index/tnx?countrycode=xx"  

    driver.get(url)
    driver.implicitly_wait(10)
    html = driver.page_source
    return html
bondRate = getBondRate()
print(bondRate)

几天前，它从Market watch上阅读得很好。现在，它在Body标签中什么也不返回。硒不加载页面吗？

Answer 1

您还需要HTML标签吗？如果不是，您可以尝试使用body标签进行检索。这就是我使用Java的方式。

String src=driver.findElement(By.tagName("body")).getText();

Answer 2

根据网址https://www.marketwatch.com/investing/index/tnx?countrycode=xx，您观察到的行为非常合理。

我已经处理了您的代码，并进行了一次简单的调整，尝试使用 PhantomJS 和 ChromeDriver 提取page_source。可以看到，当您使用任何 WebDriver 变体时，都会检测到 WebDriver 指纹，并且出现 Fingerprinting error 提出如下：

错误详细信息：

Failed to load resource: the server responded with a status of 404 (Not Found)
kpf.js?url=/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/fingerprint&token=058cbc6a-f8b8-f175-ca68-8c2e0fd6a4e3:1 Fingerprinting error 
  name: Error 
  message: Error issuing AJAX request (status code: 404) 
  stack: Error: Error issuing AJAX request (status code: 404)
    at XMLHttpRequest.N.a.onreadystatechange (https://www.marketwatch.com/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/fingerprint/script/kpf.js?url=/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/fingerprint&token=058cbc6a-f8b8-f175-ca68-8c2e0fd6a4e3:1:1884)
DevTools failed to parse SourceMap: https://www.marketwatch.com/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/fingerprint/script/fingerprint.js.map

DevTools快照：

您可以在以下位置找到相关的讨论：

Selenium无法提取页面源并返回html页面的空白正文

2 个答案: