我正在尝试从网页复制表格。我无法复制整个页面,因为它有按钮和动态元素,并且由于内存过载而将它们粘贴到表中会破坏代码,所以我试图拉动HTML并将表格粘贴到excel中。
当我将整个源代码文本复制到Word中时,它告诉我有大约23k个字母,但是当我使用innerHTML或outerHTML时,它们的长度都在15-16k左右。
我知道内部和外部都缺少很多函数,比如HTML体外部,但令我困惑的是它们在代码中间缺少我需要的表。
网站代码:
<div class="row" >
<div class="col-lg-12 col-md-12 col-sm-12" >
</div>
<div class="col-lg-12 col-md-12 col-sm-12" >
<table class="table table-hover table-bordered table-striped " >
<thead>
<tr style="background:#eee">
<th class="sortable" ><a href="/employer/report/report?_action_report=Run+Report&show_advertisers=on&_show_conversions=&advertiser_id=25&_show_advertisers=&_hide_campaigns=&_show_campaigns=&end=03%2F11%2F2015&campaign_id=&begin=03%2F11%2F2015&sort=day&order=asc">Date</a></th>
<th class="sortable" ><a href="/employer/report/report?_action_report=Run+Report&show_advertisers=on&_show_conversions=&advertiser_id=25&_show_advertisers=&_hide_campaigns=&_show_campaigns=&end=03%2F11%2F2015&campaign_id=&begin=03%2F11%2F2015&sort=jobs&order=asc">Current Jobs Listed</a></th>
<th class="sortable" ><a href="/employer/report/report?_action_report=Run+Report&show_advertisers=on&_show_conversions=&advertiser_id=25&_show_advertisers=&_hide_campaigns=&_show_campaigns=&end=03%2F11%2F2015&campaign_id=&begin=03%2F11%2F2015&sort=impressions&order=asc">Impressions</a></th>
<th class="sortable" ><a href="/employer/report/report?_action_report=Run+Report&show_advertisers=on&_show_conversions=&advertiser_id=25&_show_advertisers=&_hide_campaigns=&_show_campaigns=&end=03%2F11%2F2015&campaign_id=&begin=03%2F11%2F2015&sort=clicks&order=asc">Clicks</a></th>
<th class="sortable" ><a href="/employer/report/report?_action_report=Run+Report&show_advertisers=on&_show_conversions=&advertiser_id=25&_show_advertisers=&_hide_campaigns=&_show_campaigns=&end=03%2F11%2F2015&campaign_id=&begin=03%2F11%2F2015&sort=cpc&order=asc">CPC</a></th>
<th class="sortable" ><a href="/employer/report/report?_action_report=Run+Report&show_advertisers=on&_show_conversions=&advertiser_id=25&_show_advertisers=&_hide_campaigns=&_show_campaigns=&end=03%2F11%2F2015&campaign_id=&begin=03%2F11%2F2015&sort=ctr&order=asc">CTR</a></th>
<th class="sortable" ><a href="/employer/report/report?_action_report=Run+Report&show_advertisers=on&_show_conversions=&advertiser_id=25&_show_advertisers=&_hide_campaigns=&_show_campaigns=&end=03%2F11%2F2015&campaign_id=&begin=03%2F11%2F2015&sort=cost&order=asc">Estimated cost</a></th>
<th class="sortable" ><a href="/employer/report/report?_action_report=Run+Report&show_advertisers=on&_show_conversions=&advertiser_id=25&_show_advertisers=&_hide_campaigns=&_show_campaigns=&end=03%2F11%2F2015&campaign_id=&begin=03%2F11%2F2015&sort=daily_budget&order=asc">Current Daily Budget</a></th>
<th style="vertical-align:top" ><a href="#" onclick="return false;">Edit Campaign</a></th>
<th style="vertical-align:top" ></th>
</tr>
</thead>
<tbody>
<tr class="odd 2015-03-11">
<td>2015-03-11</td>
<td class="jobsListed" >437879</td>
<td>148397</td>
<td>1379</td>
<td>$0.36</td>
<td>0.93%</td>
<td >$491.16</td>
<td class="dailyBudget">$15500.00</td>
<td ><a href="/employer/campaign/">Edit</a></td>
</tr>
<tr class="dg" >
<td colspan="1" class="text-right"><b>Total:</b></td>
<td class="jobsListed" >437879</td>
<td>148397</td>
<td>1379</td>
<td>$0.36</td>
<td>0.93%</td>
<td >$491.16</td>
<td class="dailyBudget">$15500.00</td>
<td ></td>
<td ></td>
</tr>
</tbody>
</table>
</div>
</div>
</div><!--container ends here -->
以下是我试图获取表数据的方法:
Dim appIE As Object ' InternetExplorer.Application
Set appIE = CreateObject("InternetExplorer.Application")
Dim strSource As String
Dim TableString As String
strSource = CStr(appIE.document.body.outerHTML)
TableString = Mid(strSource, _
InStr(strSource, "<table"), _
InStr(strSource, "</table>") - InStr(strSource, "<table"))
Dim ClipBoard As New DataObject
ClipBoard.SetText TableString
ClipBoard.PutInClipboard
它给了我一个错误,因为它在字符串中找不到<table
。我几次穿过琴弦,发现桌子应该是这样的空间:
class="col-lg-12 col-md-12 col-sm-12">
</div>
</div>
</div><!--container ends here -->
有什么想法吗?感谢
答案 0 :(得分:0)
我终于弄明白了问题是什么!
IE正在以可视方式加载页面,但机器仍然认为它在登录屏幕上。我能够看到这个的方式是通过立即窗口中的appIE.LocationURL
。
因此,它无法在页面上找到该表,因为它在登录页面上不存在。
这个问题的解决方案很简单。
appIE.Quit
只关闭最近的寡妇。代码:
MakeIE:
set appIE = CreateObject("InternetExplorer.Application")
...
With appIE
.Navigate sURL
Application.Wait (Now + TimeValue("00:00:01"))
.Visible = True
.Height = 500
.Width = 500
Application.Wait (Now + TimeValue("00:00:01"))
' loop until the page finishes loading
Do Until .ReadyState = 4: DoEvents: Loop
End With
....
If appIE.LocationURL <> sURL Then GoTo MakeIE
杀死aLL IE窗口的代码(谨慎使用 - 将杀死所有IE):
Option Explicit
Sub IE_Sledgehammer()
Dim objWMI As Object, objProcess As Object, objProcesses As Object
Set objWMI = GetObject("winmgmts://.")
Set objProcesses = objWMI.ExecQuery( _
"SELECT * FROM Win32_Process WHERE Name = 'iexplore.exe'")
For Each objProcess In objProcesses
On Error Resume Next
Call objProcess.Terminate
On Error GoTo 0
Next
Application.Wait (Now + TimeValue("0:00:03"))
Set objProcesses = Nothing: Set objWMI = Nothing
Application.Wait (Now + TimeValue("0:00:03"))
End Sub