我写了这段代码来从网络链接中提取手机号码 基本上我在列表框中有三个链接,并使用下面的代码获取其源代码 现在虽然我试图使用RegEx提取电话号码,但我一次又一次地获得相同的号码。 这是我写的完整代码!并且我提取链接的网站是
http://bolee.com/nf/all-results
Dim doc As New HtmlAgilityPack.HtmlDocument()
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
If ListBox1.Items.Count = 0 Then
MsgBox("Please Extract Links First")
Else
ListBox1.SelectedIndex = 0
End If
End Sub
Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
ScrapLinks()
End Sub
Private Function ScrapLinks()
Dim hw As New HtmlWeb()
Try
doc = hw.Load(TextBox1.Text)
doc.LoadHtml(doc.DocumentNode.SelectSingleNode("//*[@id='ad_list']").InnerHtml())
For Each link As HtmlNode In doc.DocumentNode.SelectNodes("//a[@href]")
Dim hrefValue As String = link.GetAttributeValue("href", String.Empty)
If hrefValue.Contains("/detail/") Then
ListBox1.Items.Add(hrefValue)
End If
Next
Dim items(ListBox1.Items.Count - 1) As Object
ListBox1.Items.CopyTo(items, 0)
ListBox1.Items.Clear()
ListBox1.Items.AddRange(items.AsEnumerable().Distinct().ToArray())
lbllinks.Text = ListBox1.Items.Count
Catch ex As Exception
MsgBox("Error " + ex.Message)
End Try
Return Nothing
End Function
Private Sub ListBox1_SelectedIndexChanged(sender As Object, e As EventArgs) Handles ListBox1.SelectedIndexChanged
Try
Dim re As New Regex("(\+92|0092)-?\d{3}-?\d{7}|\d{11}|\d{4}-\d{7}")
' For Each link As String In ListBox1.Items
Dim hw As New HtmlWeb()
doc = hw.Load(ListBox1.SelectedItem)
Dim data = doc.DocumentNode.SelectSingleNode("//*[@class='det_ad f_left']").InnerText
' For Each match As Match In re.Matches(data)
TextBox2.Text = Data
' Next
'Next
Catch ex As Exception
MsgBox("Error " + ex.Message)
End Try
End Sub
以下是我得到的输出样本
03152405552 03152405552 03152405552 03152405552 03152405552 03152405552
答案 0 :(得分:0)
请尝试使用此代码:
Try
For Each link As String In ListBox1.Items
Listbox1.SelectedIndex += 1
Dim hw As New HtmlWeb()
doc = hw.Load(ListBox1.SelectedItem)
Dim data = doc.DocumentNode.SelectSingleNode("//*[@class='det_ad f_left']").InnerText
For Each match As Match In Regex.Matches(data, "(\+92|0092)-?\d{3}-?\d{7}|\d{11}|\d{4}-\d{7}")
TextBox2.Text += vbNewLine & match.Value
Next
Next
Catch ex As Exception
MsgBox("Error " + ex.Message)
End Try
我们的想法是在每个新输入数据上创建一个新的正则表达式,以避免任何缓存。