我写了这整个代码来从网站中提取细胞数量 但问题是它是完美地提取数字,但非常慢,它也在提取时悬挂我的表格,请求帮助我让它运行得更快。
并且运行效率更高。
Imports HtmlAgilityPack
Imports System.Text.RegularExpressions
Public Class Extractor
Shared doc As New HtmlAgilityPack.HtmlDocument()
Public Shared Function ScrapLinks(TextBox1 As TextBox, ListBox1 As ListBox, lbllinks As Label)
Dim hw As New HtmlWeb()
Try
doc = hw.Load(TextBox1.Text)
doc.LoadHtml(doc.DocumentNode.SelectSingleNode("//*[@id='ad_list']").InnerHtml())
For Each link As HtmlNode In doc.DocumentNode.SelectNodes("//a[@href]")
Dim hrefValue As String = link.GetAttributeValue("href", String.Empty)
If hrefValue.Contains("/detail/") Then
If Not ListBox1.Items.Contains(hrefValue) Then
ListBox1.Items.Add(hrefValue)
End If
End If
Next
Catch ex As Exception
MsgBox("Error " + ex.Message)
End Try
Return Nothing
End Function
Public Shared Function Scrapnums(lstbox As ListBox,lstnum As ListBox)
Try
Dim hw As New HtmlWeb()
doc = hw.Load(lstbox.SelectedItem)
Dim data = doc.DocumentNode.SelectSingleNode("//*[@class='det_ad f_left']").InnerText
Dim m As Match = Regex.Match(data, "(\+92|0092)-?\d{3}-?\d{7}|\d{11}|\d{4}-\d{7}")
If Not lstnum.Items.Contains(m.Value) Then
lstnum.Items.Add(m.Value)
End If
Catch ex As Exception
End Try
Return Nothing
End Function
End Class
答案 0 :(得分:1)
这是一个解析电话号码的正则表达式
<强>正则表达式强>
((?(?=\+92|0092)(?:\+|00)92\d?(?:-\d{3}-?\d{7}|-?\d{9}|\d{10})|(?:\d{11}|\d{3}\s\d{3}\s\d{5,6}|\d{4}-\d{7})))
示例编号
+92-3113143446
+923-113143446
032 124 26003
923 072 776037
03154031162
+923218923116
0307-2796038
+92-343-2842120
<强>结果强>
+92-3113143446
+923-113143446
032 124 26003
923 072 776037
03154031162
+923218923116
0307-2796038
+92-343-2842120
<强>演示强>
<强> Online Demo 强>
以上正则表达式基于假设,它可能匹配上面列出的更多模式。因此可能需要根据需要进行改进。