Question

我有一个字符串：

<a href = "http://www.zigwheels.com/reviews/long-term-reviews/fiat-linea/8804-100-1/1">
  <img src="http://static.zigwheels.com/media/content/2011/Jul/fiatlinealt_1_560x420.jpg" />
</a> 
<p>
  To sum it up in a nutshell, the Fiat Linea is a spacious family car that 
  rewards you with its space and fuel efficiency, while maintaining 
  decent levels of performance as well
</p>

我只需要<p>标记中的文字。请帮忙......我需要用纯vb语言为vb.net windows应用程序。

Answer 1

这取决于输入数据，但对于像这样的简单情况，您可以使用与标记之间的文本匹配的正则表达式。

Imports System.Text.RegularExpressions

Dim input As String = ... ' Your string
Dim match As Match = Regex.Match(input, "<p>(?<content>.*)</p>")
If match.Success Then
    Dim content As String = match.Groups("content").Value ' The text between <p> and </p>
End If

这当然不是解析HTML的解决方案，因为您需要HTML解析器。但它可用于匹配非常简单的字符串，如您提供的字符串。如果您匹配的字符串更复杂，或者您需要更复杂的匹配，那么您需要一个不同的解决方案。

Answer 2

您可以使用HTML Agility Pack。这是一个例子

HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml("Get the entire string here");
var xyz = from x in htmlDoc.DocumentNode.DescendantNodes()
                     where x.Name == "p"
                     select x.InnerText;

通过这种方式，您可以根据需要获取值。您可以从以下链接获得更多帮助。

http://htmlagilitypack.codeplex.com/

编辑:: VB.NET

Dim htmlDoc As New HtmlDocument()
htmlDoc.LoadHtml("Get the entire string here")
Dim xyz = From x In htmlDoc.DocumentNode.DescendantNodes() Where x.Name = "p"x.InnerText

从“<p>”HTML标记</p>中提取文本

2 个答案: