我有以下代码来抓取div元素:
For Each ele As HtmlElement In WebBrowser1.Document.GetElementsByTagName("div")
If ele.GetAttribute("className").Contains("description") Then
Dim content As String = ele.InnerHtml
If content.Contains("http://myserver.com/image/check.png") Then
'Do stuff if image exists
Else
'Do stuff if image doesn't exist
End If
End If
div元素如下所示:
<DIV class=headline><SPAN class=blue-title-lg>TITLE_HERE
</SPAN> LOCATION1_HERE, LOCATION2_HERE</DIV>DESCRIPTION_HERE<BR>
<DIV class=about><A class=link href="viewprofile.aspx?
profile_id=00000000">USERNAME</A> 20 FSM -
Friends <FONT color=green>Online Today</FONT></DIV>
当勾号图像不存在时,我想抓住其中的网址:
<a class=link href="viewprofile.aspx?profile_id=00000000"></a>
并将其放入字符串中。这是我打砖墙的地方,我需要一些帮助。我认为正则表达式解决方案可以解决我的问题,但正则表达式是我的弱点之一。有人能让我摆脱苦难吗?
答案 0 :(得分:0)
解决了!
我睡着了,想出了一个非常简单的方法来解决它。我的应用程序的UI现在看起来像一团糟,但我稍后会对此进行排序。我有我需要的信息。
我是这样做的:
Dim PageElement As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("a")
For Each CurElement As HtmlElement In PageElement
Dim linkunverified As String
linkunverified = CurElement.GetAttribute("href")
If linkunverified.Contains("viewprofile.aspx") Then
If ListBox1.Items.Contains(linkunverified) Then
Else
ListBox1.Items.Add(linkunverified)
End If
End If
Next
For Each ele As HtmlElement In WebBrowser1.Document.GetElementsByTagName("div")
If ele.GetAttribute("className").Contains("description") Then
Dim content As String = ele.InnerHtml
If content.Contains("http://pics.myserver.com/image/check.png") Then
Else
Dim i As Integer
For i = 0 To ListBox1.Items.Count - 1
If content.Contains(ListBox1.Items(i).Remove(0, 24)) Then
ListBox2.Items.Add("http://www.myserver.com/" & ListBox1.Items(i).Remove(0, 24))
End If
Next
End If
End If
Next