我正在尝试从我输入的网页中提取包含“ http://www.bursamalaysia.com/market/listed-companies/company-announcements/”的所有超链接。
首先,代码运行良好,但是之后,我遇到了无法提取所需的url链接的问题。每当我运行潜水艇时,它就会丢失。
Sub scrapeHyperlinks()
Dim IE As InternetExplorer
Dim html As HTMLDocument
Dim ElementCol As Object
Dim Link As Object
Dim erow As Long
Application.ScreenUpdating = False
Set IE = New InternetExplorer
For u = 1 To 50
IE.Visible = False
IE.navigate Cells(u, 2).Value
Do While IE.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to websitehahaha"
DoEvents
Loop
Set html = IE.document
Set ElementCol = html.getElementsByTagName("a")
For Each Link In ElementCol
erow = Worksheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 1).Value = Link
Cells(erow, 1).Columns.AutoFit
Next
Next u
ActiveSheet.Range("$A$1:$A$152184").AutoFilter Field:=1, Criteria1:="http://www.bursamalaysia.com/market/listed-companies/company-announcements/???????", Operator:=xlAnd
For k = 1 To [A65536].End(xlUp).Row
If Rows(k).Hidden = True Then
Rows(k).EntireRow.Delete
k = k - 1
End If
Next k
Set IE = Nothing
Application.StatusBar = ""
Application.ScreenUpdating = True
End Sub
答案 0 :(得分:1)
仅从给定的URL中获取您提到的合格 private void BindGrid()
{
DataTable dt = new DataTable();
String strConnString = System.Configuration.ConfigurationManager.ConnectionStrings["connStr"].ConnectionString;
MySqlConnection con = new MySqlConnection(strConnString);
MySqlDataAdapter sda = new MySqlDataAdapter();
MySqlCommand cmd = new MySqlCommand("GetApprovedData1");
cmd.CommandType = CommandType.StoredProcedure;
DateTime? dateValue = null;
if (ViewState["Date"] != null && ViewState["Date"].ToString() != "0")
{
dateValue = DateTime.Parse(ViewState["Date"].ToString());
}
cmd.Parameters.AddWithValue("dateValue", dateValue);
cmd.Connection = con;
sda.SelectCommand = cmd;
sda.Fill(dt);
gdvTM.DataSource = dt;
int i = dt.Rows.Count;
gdvTM.DataBind();
this.BindDropDownList();
TableCell cell = gdvTM.HeaderRow.Cells[0];
setDropdownselectedItem(ViewState["Date"] != null ? (string)ViewState["Date"] : string.Empty, cell.FindControl("ddlgvdate") as DropDownList);
}
,我将使用以下内容。它使用CSS选择器组合来定位指定页面中感兴趣的URL。
CSS选择器组合为
hrefs
这是descendant selector,用于查找属性值为#bm_ajax_container [href^='/market/listed-companies/company-announcements/']
的元素,其值以href
开头,并具有ID为/market/listed-companies/company-announcements/
的父元素。该父元素是ajax容器div。 "#"
是一个ID选择器,而“ []”则是一个属性选择器。 bm_ajax_container
的意思是开头。
容器div和第一个匹配的href示例:
由于要匹配多个元素,因此会通过"^"
方法应用CSS选择器组合。这将返回一个querySelectorAll
,其nodeList
可以遍历以通过索引访问单个项目。
完整的合格链接被写到工作表中。
使用选择器(示例)的页面示例CSS查询结果:
VBA:
.Length