获取网页源代码,包括javascript

时间:2012-09-16 16:13:43

标签: autoit

我正在尝试使用AutoIt创建Google关键字工具抓取工具。 我正在使用以下代码:

#include <IE.au3> 
$oIE = _IECreate ("https://adwords.google.com/o/KeywordTool")
sleep(20000)
$source = _IEDocReadHTML  ($oIE)

MsgBox(0,'',$source)

(睡眠是为了让我有时间输入查询并点击IE窗口中的搜索 - 将来我会自动执行此操作)

它输出的HTML源代码不包含带有结果的表格,尽管我可以在Firebug中看到它。 下面是我用Firebug提取的一个单行。

<tr __gwt_row="19" __gwt_subrow="0" class="sCT"><td class="sBS sDT sES" align="left"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1059"><div id="gwt-debug-column-SELECTION-row-19-0"><input type="checkbox" class="sML"></div></div></td><td class="sBS sDT" align="left"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1060"><div id="gwt-debug-column-KEYWORD-row-19-1"><span style="white-space:nowrap"><span></span><span><a class="sOL" gwtuirendered="gwt-uid-1089"><b>windows</b> live</a></span><span></span></span></div></div></td><td class="sBS sDT" align="left"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1062"><div id="gwt-debug-column-COMPETITION-row-19-2"><div title="0,04">Bassa</div></div></div></td><td class="sBS sDT" align="right"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1063"><div id="gwt-debug-column-GLOBAL_MONTHLY_SEARCHES-row-19-3">20.400.000</div></div></td><td class="sBS sDT" align="right"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1064"><div id="gwt-debug-column-AVERAGE_TARGETED_MONTHLY_SEARCHES-row-19-4">20.400.000</div></div></td><td class="sBS sDT" align="right"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1065"><div id="gwt-debug-column-SUGGESTED_BID-row-19-5">€&nbsp;0,40</div></div></td><td class="sBS sDT aw-ti-advertiser-specific-cell" align="right"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1066"><div id="gwt-debug-column-AD_SHARE-row-19-6">-</div></div></td><td class="sBS sDT" align="right"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1067"><div id="gwt-debug-column-AVERAGE_MONTHLY_SEARCHES_WITH_AFS-row-19-7">-</div></div></td><td class="sBS sDT aw-ti-advertiser-specific-cell" align="right"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1068"><div id="gwt-debug-column-SEARCH_SHARE-row-19-8">-</div></div></td><td class="sBS sDT" align="right"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1069"><div id="gwt-debug-column-TARGETED_MONTHLY_SEARCHES-row-19-9"><div style="width: 108px; white-space: nowrap" dir="ltr"><div style="width: 8px;height: 16px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 16px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 13px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 13px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 16px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 13px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 13px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 10px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 10px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 10px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 10px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 10px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div></div></div></div></td><td class="sBS sDT sOS" align="left"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1070"><div id="gwt-debug-column-EXTRACTED_FROM_WEBPAGE-row-19-10">-</div></div></td></tr>

有没有办法用Autoit获取完整的源代码,包括使用javascript生成的内容?

1 个答案:

答案 0 :(得分:0)

我会使用http请求,因为这是最直接的方法。 它似乎顺便给出了状态404 编辑:网址丢失了最后一封信,导致404状态。

#include <GUIConstantsEx.au3>
#include <winapi.au3>


MsgBox(0,default,get_url("https://adwords.google.com/o/KeywordTool"))
  Func get_url($url)

    $RequestURL = $url;
    Global $oHTTP = ObjCreate("winhttp.winhttprequest.5.1") ;
    $oHTTP.Open("GET", $RequestURL, False)
    $oHTTP.Send()
    if  $oHTTP.status == 200 Then
        Return $oHTTP.ResponseText
    Else
         Return  "ooops... status: " & $oHTTP.status  
    EndIf

EndFunc