Question

好的，问题是我有HTML字符串。我需要找到这样的特定格式：

some text

该HTML，我需要提取some text并将其保存到列表中。如何实现我的目标。

请注意，文字可能会显示为

<p>
    Central: 
<span class="fieldText">Central_Local</span><br>Area Resolutoria:  
<span class="fieldText">Area_Resolutoria</span><br>VPI:  
<span class="fieldText">VIP</span><br>Ciudad: <span class="fieldText">Ciudad</span>   <br>Estado:  <span class="fieldText">Estado</span><br>Region  <span class="fieldText">Region</span>    
</p>

Answer 1

您可以尝试正则表达式：@"(.*?)" 如果将它与捕获结合使用，则可以使用@"^(.*?(.*?).*?)+$"获取整个列表。

但事实是你不应该使用正则表达式来处理XML或HTML - 那里有很多解析器，正如其他人已经提到的那样。

            string s = @"
<p>
    Central: 
<span class=""fieldText"">Central_Local</span><br>Area Resolutoria:  
<span class=""fieldText"">Area_Resolutoria</span><br>VPI:  
<span class=""fieldText"">VIP</span><br>Ciudad: <span class=""fieldText"">Ciudad</span>   <br>Estado:  <span class=""fieldText"">Estado</span><br>Region  <span class=""fieldText"">Region</span>    
</p>";

            Match m = Regex.Match(s, @"^(.*?<span .*?>(.*?)</span>.*?)+$", RegexOptions.Singleline);

            foreach (var capture in m.Groups[2].Captures)
                Console.WriteLine(capture);

Answer 2

我不喜欢像这样的东西使用正则表达式。

我写了一个免费的HTML tag parser，您既可以按原样使用，也可以根据自己的需要进行修改，或者仅作为指导，让您自己解决这个问题。

Answer 3

您是否尝试过HtmlAgilityPack？

Answer 4

对于像这样的小东西，我更喜欢使用正则表达式。不确定C＃语法是什么，但表达式看起来像这样：

|<span class="fieldText">(.+)</span>|

Jonathan Wood建议使用HTML标记解析器也是一个好主意，特别是如果你要进行大量的解析。

Answer 5

Regex已被证明是解析HTML的糟糕解决方案。 HTML Agility Pack正是您完成此任务所需的。

需要帮助解析HTML标签之间的文本

5 个答案: