Question

使用selenium时，刚解析selenium getPageSource()方法的输出时出现错误。 firefox =

页面源上的实际元标记

  <meta name="news_keywords" content="devo max,independence vote,no campaign,referendum,scotland \"no\" vote,scotland independence,scotland powers,scotland referendum,scotland vote,scottish referendum" />

使用带有selenium =

的firefox驱动程序的getPageSource（）方法结果

<meta referendum"="" vote,scottish="" referendum,scotland="" powers,scotland="" independence,scotland="" vote,scotland="" no\"="" content="devo max,independence vote,no campaign,referendum,scotland \" name="news_keywords" />

它非常荒谬，并且在进一步处理html输出时产生了问题。任何建议或帮助或解决方法？

Answer 1

来自文档：

getPageSource

java.lang.String getPageSource（）

获取上次加载的页面的来源。如果页面已被修改   加载后（例如，通过Javascript），无法保证   返回的文本是修改后的页面的文本。请咨询   用于确定是否的特定驱动程序的文档   返回的文本反映页面或文本的当前状态   最后由Web服务器发送。返回的页面源是   底层DOM的表示：不要指望它被格式化   或者以与从Web服务器发送的响应相同的方式进行转义。   把它想象成艺术家的印象。

返回：       当前页面的来源

http://selenium.googlecode.com/git/docs/api/java/org/openqa/selenium/WebDriver.html#getPageSource%28%29

Selenium Web驱动程序getPageSource（）错误包含转义值的属性和值

1 个答案: