在Vb.net中使用正则表达式来提取电话号码

时间:2014-09-10 16:41:39

标签: html regex vb.net html-agility-pack

我写了这段代码来从网络链接中提取手机号码 基本上我在列表框中有三个链接,并使用下面的代码获取其源代码 现在虽然我试图使用RegEx提取电话号码,但我一次又一次地获得相同的号码。 这是我写的完整代码!并且我提取链接的网站是

http://bolee.com/nf/all-results

Dim doc As New HtmlAgilityPack.HtmlDocument()

Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    If ListBox1.Items.Count = 0 Then
        MsgBox("Please Extract Links First")
    Else
        ListBox1.SelectedIndex = 0
    End If
End Sub

Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
    ScrapLinks()
End Sub



Private Function ScrapLinks()
    Dim hw As New HtmlWeb()
    Try
        doc = hw.Load(TextBox1.Text)
        doc.LoadHtml(doc.DocumentNode.SelectSingleNode("//*[@id='ad_list']").InnerHtml())

        For Each link As HtmlNode In doc.DocumentNode.SelectNodes("//a[@href]")

            Dim hrefValue As String = link.GetAttributeValue("href", String.Empty)

            If hrefValue.Contains("/detail/") Then
                ListBox1.Items.Add(hrefValue)
            End If
        Next

        Dim items(ListBox1.Items.Count - 1) As Object
        ListBox1.Items.CopyTo(items, 0)
        ListBox1.Items.Clear()
        ListBox1.Items.AddRange(items.AsEnumerable().Distinct().ToArray())
        lbllinks.Text = ListBox1.Items.Count

    Catch ex As Exception
        MsgBox("Error " + ex.Message)

    End Try
    Return Nothing

End Function
Private Sub ListBox1_SelectedIndexChanged(sender As Object, e As EventArgs) Handles ListBox1.SelectedIndexChanged
        Try
        Dim re As New Regex("(\+92|0092)-?\d{3}-?\d{7}|\d{11}|\d{4}-\d{7}")

        ' For Each link As String In ListBox1.Items

        Dim hw As New HtmlWeb()
        doc = hw.Load(ListBox1.SelectedItem)
        Dim data = doc.DocumentNode.SelectSingleNode("//*[@class='det_ad f_left']").InnerText

        '    For Each match As Match In re.Matches(data)

        TextBox2.Text = Data


        '    Next
        'Next

    Catch ex As Exception
        MsgBox("Error " + ex.Message)

    End Try
End Sub

以下是我得到的输出样本

03152405552 03152405552 03152405552 03152405552 03152405552 03152405552

1 个答案:

答案 0 :(得分:0)

请尝试使用此代码:

Try

    For Each link As String In ListBox1.Items
        Listbox1.SelectedIndex += 1
        Dim hw As New HtmlWeb()
        doc = hw.Load(ListBox1.SelectedItem)
        Dim data = doc.DocumentNode.SelectSingleNode("//*[@class='det_ad f_left']").InnerText

        For Each match As Match In Regex.Matches(data, "(\+92|0092)-?\d{3}-?\d{7}|\d{11}|\d{4}-\d{7}")
            TextBox2.Text += vbNewLine & match.Value
        Next
    Next

Catch ex As Exception
    MsgBox("Error " + ex.Message)

End Try

我们的想法是在每个新输入数据上创建一个新的正则表达式,以避免任何缓存。