我正在使用WebDriver和java来获取页面源代码。使用FirefoxDriver我试图验证页面源上的一些文本,但是当我使用driver.getPageSource时,它会转换一些符号,如<到$ lt;和>到>因此,我很难核实内容。
有人可以指导我如何避免这种情况吗?
<noscript>
<div id="noScriptContainer">
<p>JavaScript is not enabled! Either you have disabled it or your browser does not support it. Because of this, you will not be able to view our pages or use our site features. Please turn on JavaScript in your browser settings or upgrade your browser version to use our site. </p>
</div>
</noscript>
转换为=
<noscript>
<div id="noScriptContainer">
<p>JavaScript is not enabled! Either you have disabled it or your browser does not support it. Because of this, you will not be able to view our pages or use our site features. Please turn on JavaScript in your browser settings or upgrade your browser version to use our site. </p>
</div>
答案 0 :(得分:1)
通常最好不要使用WebDriver的getPageSource()方法,而是使用JavaScriptExecutor通过javascript获取页面源。
String pageSource = ((JavaScriptExecutor)driver).executeScript("return document.documentElement.outerHTML;").toString();
答案 1 :(得分:0)
是的,这对于子元素来说是一个问题。您可以使用已经告知过的javascript,也可以使用url解码所获得的内容并接收初始源代码。
String result = java.net.URLDecoder.decode(url, "UTF-8");