<i>Location Status Alarm Plug-In Time
RST2 Set CR Link to RST1 4/12/94 08:14:22
LET 1 11 Set MJ LOS OP T1X-XCVR 4/10/94 10:17:45</i>
上述消息
我想使用regexp将上述消息拆分成组,
我正在尝试这个,
<i>(.*)\s+(Set|Clr|Cur)\s+([A-Z]+)\s+(.*)\s+(\d+\/\d+\/\d+\s+\d\d:\d\d:\d\d)</i>
,输出是 对于第一行
1. [0-12] `RST2 ` -- should not have whitespaces ( should be only "RST2")
2. [13-16] `Set`
3. [17-19] `CR`
4. [24-50] `Link to RST1 GHT ` -- should contain only "Link to RST1" GHT should be in another group
5. [51-67] `4/12/94 08:14:22`
第二行
1. [0-12] `LET 1 11 ` -- should not have trailing whitespaces ( should be only "LET 1 11")
2. [13-16] `Set`
3. [17-19] `MJ`
4. [24-50] `LOS OP T1X-XCVR ` ---- This group should have only "LOS OP" without any whitespace and T1X-XCVR should be in another group
5. [51-67] `4/10/94 10:17:45`
有没有办法得到我想要的输出。我试过+ $删除空格,但它不起作用
期望的输出
1. [0-12] `RST2`
2. [13-16] `Set`
3. [17-19] `CR`
4. [24-50] `Link to RST1`
5. [frm-to] ``
6. [51-67] `4/12/94 08:14:22`
1. [0-12] `LET 1 11`
2. [13-16] `Set`
3. [17-19] `MJ`
4. [24-50] `LOS OP`
5. [frm-to] `T1X-XCVR`
6. [51-67] `4/10/94 10:17:45`</i>
答案 0 :(得分:2)
诀窍是懒惰而不是贪吃
^(.*?)\s+(Set|Clr|Cur)\s+([A-Z]+)\s+(\w+(?: \w+)*)\s+(.*?)\s+(\d+\/\d+\/\d+\s+\d\d:\d\d:\d\d)(?:<\/i>)?$
适用于您的两个输入行(see it here)。但是它可能在其他输入上失败,例如当“插件”字符串与“警报”的字符串相距仅一个空格时。在这种情况下,您无法正确区分第4组和第5组,并且需要适当的分隔符或固定的列长度。
或者,如果您正在读取网站,请检查DOM是否是从JSON资源动态构建的,如果是这种情况,请尝试直接获取该资源。