使用Regexp将句子拆分成组

时间:2015-09-02 14:58:21

标签: regex

<i>Location     Status     Alarm           Plug-In    Time

RST2         Set CR     Link to RST1                  4/12/94 08:14:22
LET 1 11     Set MJ     LOS OP            T1X-XCVR      4/10/94 10:17:45</i>

上述消息

我想使用regexp将上述消息拆分成组,

我正在尝试这个,

<i>(.*)\s+(Set|Clr|Cur)\s+([A-Z]+)\s+(.*)\s+(\d+\/\d+\/\d+\s+\d\d:\d\d:\d\d)</i>

,输出是 对于第一行

1.  [0-12]  `RST2        `   --          should not have whitespaces ( should be only "RST2")
2.  [13-16] `Set`
3.  [17-19] `CR`
4.  [24-50] `Link to RST1        GHT         `  -- should contain only "Link to RST1" GHT should be in another group
5.  [51-67] `4/12/94 08:14:22`

第二行

1.  [0-12]  `LET 1 11    `  --   should not have trailing whitespaces ( should be only "LET 1 11")
2.  [13-16] `Set`
3.  [17-19] `MJ`
4.  [24-50] `LOS OP          T1X-XCVR  `   ----   This group should have only "LOS OP" without any whitespace and T1X-XCVR should be in another group
5.  [51-67] `4/10/94 10:17:45`

有没有办法得到我想要的输出。我试过+ $删除空格,但它不起作用

期望的输出

    1.  [0-12]  `RST2` 
    2.  [13-16] `Set`
    3.  [17-19] `CR`
    4.  [24-50] `Link to RST1`
    5.  [frm-to] ``
    6.  [51-67] `4/12/94 08:14:22`


    1.  [0-12]  `LET 1 11`
    2.  [13-16] `Set`
    3.  [17-19] `MJ`
    4.  [24-50] `LOS OP`
    5.  [frm-to] `T1X-XCVR`
    6.  [51-67] `4/10/94 10:17:45`</i>

1 个答案:

答案 0 :(得分:2)

诀窍是懒惰而不是贪吃 ^(.*?)\s+(Set|Clr|Cur)\s+([A-Z]+)\s+(\w+(?: \w+)*)\s+(.*?)\s+(\d+\/\d+\/\d+\s+\d\d:\d\d:\d\d)(?:<\/i>)?$适用于您的两个输入行(see it here)。但是它可能在其他输入上失败,例如当“插件”字符串与“警报”的字符串相距仅一个空格时。在这种情况下,您无法正确区分第4组和第5组,并且需要适当的分隔符或固定的列长度。

或者,如果您正在读取网站,请检查DOM是否是从JSON资源动态构建的,如果是这种情况,请尝试直接获取该资源。