正则表达式\ n不起作用

时间:2013-05-29 11:32:45

标签: html regex vb.net

我正在尝试用两行HTML解析文本。

Dim PattStats As New Regex("class=""head"">(.+?)</td>"+ 
                           "\n<td>(.+?)</td>")
Dim makor As MatchCollection = PattStats.Matches(page)

For Each MatchMak As Match In makor
    ListView3.Items.Add(MatchMak.Groups(1).Value)
Next

我添加了\n以匹配下一行,但由于某种原因它无效。这是我正在运行正则表达式的源代码。

<table class="table table-striped table-bordered table-condensed">
  <tbody>
    <tr>
      <td class="head">Health Points:</td>
      <td>445 (+85 / per level)</td>
      <td class="head">Health Regen:</td>
      <td>7.25</td>
    </tr>
    <tr>
      <td class="head">Energy:</td>
      <td>200</td>
      <td class="head">Energy Regen:</td>
      <td>50</td>
    </tr>
    <tr>
      <td class="head">Damage:</td>
      <td>53 (+3.2 / per level)</td>
      <td class="head">Attack Speed:</td>
      <td>0.694 (+3.1 / per level)</td>
    </tr>           
    <tr>
      <td class="head">Attack Range:</td>
      <td>125</td>
      <td class="head">Movement Speed:</td>
      <td>325</td>
    </tr>
    <tr>
      <td class="head">Armor:</td>
      <td>16.5 (+3.5 / per level)</td>
      <td class="head">Magic Resistance:</td>
      <td>30 (+1.25 / per level)</td>
    </tr>       
    <tr>
      <td class="head">Influence Points (IP):</td>
      <td>3150</td>
      <td class="head">Riot Points (RP):</td>
      <td>975</td>
    </tr>
  </tbody>
</table>

我想在一个正则表达式中匹配第一个<td class...>和以下行:/

1 个答案:

答案 0 :(得分:1)

描述

此正则表达式将找到td个标记,并以两个为一组返回。

<td\b[^>]*>([^<]*)<\/td>[^<]*<td\b[^>]*>([^<]*)<\/td>

enter image description here

摘要

  • <td\b[^>]*>找到第一个td标记并使用任何属性
  • ([^<]*)捕获第一个内部文本,这可能是贪婪但我们假设单元格没有嵌套标记
  • <\/td>找到关闭标记
  • [^<]*移动到文本的其余部分,直到您,这假设第一个和第二个td标记之间没有其他标记
  • <td\b[^>]*>找到第二个td tage并使用任何属性
  • ([^<]*)捕获第二个内部文本,这可能是贪婪但我们假设单元格没有嵌套标记
  • <\/td>找到关闭标记

组0将获得整个字符串

  1. 将拥有第一个td组
  2. 将拥有第二个td组
  3. VB.NET代码示例:

    Imports System.Text.RegularExpressions
    Module Module1
      Sub Main()
        Dim sourcestring as String = "replace with your source string"
        Dim re As Regex = New Regex("<td\b[^>]*>([^<]*)<\/td>[^<]*<td\b[^>]*>([^<]*)<\/td>",RegexOptions.IgnoreCase OR RegexOptions.Singleline)
        Dim mc as MatchCollection = re.Matches(sourcestring)
        Dim mIdx as Integer = 0
        For each m as Match in mc
          For groupIdx As Integer = 0 To m.Groups.Count - 1
            Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
          Next
          mIdx=mIdx+1
        Next
      End Sub
    End Module
    
    $matches Array:
    (
        [0] => Array
            (
                [0] => <td class="head">Health Points:</td>
              <td>445 (+85 / per level)</td>
                [1] => <td class="head">Health Regen:</td>
              <td>7.25</td>
                [2] => <td class="head">Energy:</td>
              <td>200</td>
                [3] => <td class="head">Energy Regen:</td>
              <td>50</td>
                [4] => <td class="head">Damage:</td>
              <td>53 (+3.2 / per level)</td>
                [5] => <td class="head">Attack Speed:</td>
              <td>0.694 (+3.1 / per level)</td>
                [6] => <td class="head">Attack Range:</td>
              <td>125</td>
                [7] => <td class="head">Movement Speed:</td>
              <td>325</td>
                [8] => <td class="head">Armor:</td>
              <td>16.5 (+3.5 / per level)</td>
                [9] => <td class="head">Magic Resistance:</td>
              <td>30 (+1.25 / per level)</td>
                [10] => <td class="head">Influence Points (IP):</td>
              <td>3150</td>
                [11] => <td class="head">Riot Points (RP):</td>
              <td>975</td>
            )
    
        [1] => Array
            (
                [0] => Health Points:
                [1] => Health Regen:
                [2] => Energy:
                [3] => Energy Regen:
                [4] => Damage:
                [5] => Attack Speed:
                [6] => Attack Range:
                [7] => Movement Speed:
                [8] => Armor:
                [9] => Magic Resistance:
                [10] => Influence Points (IP):
                [11] => Riot Points (RP):
            )
    
        [2] => Array
            (
                [0] => 445 (+85 / per level)
                [1] => 7.25
                [2] => 200
                [3] => 50
                [4] => 53 (+3.2 / per level)
                [5] => 0.694 (+3.1 / per level)
                [6] => 125
                [7] => 325
                [8] => 16.5 (+3.5 / per level)
                [9] => 30 (+1.25 / per level)
                [10] => 3150
                [11] => 975
            )
    
    )
    

    声明

    使用正则表达式解析html实际上并不是最好的解决方案,因为有大量边缘情况我们无法预测。但是,在这种情况下,如果输入字符串始终是基本的,并且您愿意接受正则表达式100%无法正常工作的风险,那么此解决方案可能适合您。