我做了一个非常小的测试:
var page = require('webpage').create()
, fs = require('fs');
page.open("http://www.google.it/search?q=web+design", function(status){
if (status === 'success')
{
page.render('google.png');
fs.write("source.html", page.content, 'w');
}
phantom.exit();
})
如您所见,我在google.it上搜索“网页设计”
现在,查看source.html我注意到PhantomJS生成的源代码和真实的(Chrome Inspector of Chrome)html之间存在差异。
在我的源代码中,结果包含以下代码:
<li class="g">
<h3 class="r"><a href="/url?q=http://www.html.it/web-design/&sa=U&ei=Z2LZUbSaBcGV7Abm54BI&ved=0CCwQFjAB&usg=AFQjCNGagkxLs36cXSzGjyhnBX7duCI6dA"><b>WebDesign</b> - Guide e approfondimenti per webdesigner - HTML.it</a></h3>
<div class="s">
<div class="kv" style="margin-bottom:2px"><cite>www.html.it/<b>web</b>-<b>design</b>/</cite><span class="flc"> - <a href="/url?q=http://webcache.googleusercontent.com/search%3Fq%3Dcache:3GWnT4NPDr0J:http://www.html.it/web-design/%252Bweb%2Bdesign%26hl%3Dit%26ct%3Dclnk&sa=U&ei=Z2LZUbSaBcGV7Abm54BI&ved=0CC0QIDAB&usg=AFQjCNE_1Gt5RL9WQAGZpM_3f-oxZ1VR9w">Copia cache</a></span></div>
<span class="st">WebDesign: progettazione Web, User Experience, Architettura dell'informazione, <br> i consigli di esperti designer in guide e articoli di approfondimento in italiano.</span><br>
</div>
</li>
但真正的来源(通过Chrome Inspect of Chrome阅读)是:
<li class="g">
<!--m-->
<div data-hveid="55" class="rc">
<span style="float:left"></span>
<h3 class="r"><a href="/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&ved=0CDgQFjAB&url=http%3A%2F%2Fwww.html.it%2Fweb-design%2F&ei=wmTZUfHdOYSO7AagwIHwDw&usg=AFQjCNFaDZWWczDbce8TlYh9oqYluJ-E5g&bvm=bv.48705608,d.ZGU" onmousedown="return rwt(this,'','','','2','AFQjCNFaDZWWczDbce8TlYh9oqYluJ-E5g','','0CDgQFjAB','','',event)"><em>WebDesign</em> - Guide e approfondimenti per webdesigner - HTML.it</a></h3>
<div class="s">
<div>
<div class="f kv" style="white-space:nowrap">
<cite>www.html.it/<b>web</b>-<b>design</b>/</cite>
<div class="action-menu ab_ctl">
<a href="#" data-ved="0CDkQ7B0wAQ" class="clickable-dropdown-arrow ab_button" id="am-b1" aria-label="Dettagli risultato" jsaction="ab.tdd; keydown:ab.hbke; keypress:ab.mskpe" role="button" aria-haspopup="true" aria-expanded="false"><span class="mn-dwn-arw"></span></a>
<div data-ved="0CDoQqR8wAQ" class="action-menu-panel ab_dropdown" jsaction="keydown:ab.hdke; mouseover:ab.hdhne; mouseout:ab.hdhue" role="menu" tabindex="-1">
<ul>
<li class="action-menu-item ab_dropdownitem" role="menuitem"><a href="http://webcache.googleusercontent.com/search?q=cache:3GWnT4NPDr0J:www.html.it/web-design/+&cd=2&hl=it&ct=clnk&gl=it&client=ubuntu" onmousedown="return rwt(this,'','','','2','AFQjCNEaothLaL83HBobw4UE8q_OpkIPrw','','0CDsQIDAB','','',event)" class="fl">Copia cache</a></li>
</ul>
</div>
</div>
</div>
<div class="f slp"></div>
<span class="st"><em>WebDesign</em>: progettazione Web, User Experience, Architettura dell'informazione, i consigli di esperti designer in guide e articoli di approfondimento in italiano.</span>
</div>
</div>
</div>
<!--n-->
</li>
你可以看到最后一段代码更完整。
所以我的问题是:
为什么这些结果会有不同的代码?
我读过PhantomJS在浏览器中执行所有JS Inside页面,为什么会出现这些差异呢?
谢谢!
答案 0 :(得分:2)
因为PhantomJS有不同的用户代理。如果您将用户代理更改为Google Chrome,则会收到与Google Chrome中相同的结果。
您可以通过page.settings.userAgent
属性更改用户代理。
答案 1 :(得分:1)
也许尝试等待Google的js代码所做的所有DOM转换都已执行...例如,这可以通过等待.action-menu
元素可用来实现(免责声明:作为casperjs作者,我我在这里使用casperjs:
var fs = require('fs');
require('casper').create()
.start("http://www.google.it/search?q=web+design")
.waitForSelector(".action-menu", function() {
this.capture('google.png');
fs.write("source.html", this.getPageContent(), 'w');
}).run();