拆分包括正则表达式匹配

时间:2012-01-13 00:12:34

标签: javascript regex parsing

我正在使用JavaScript解析一些文本。假设我有一些字符串:

"hello wold <1> this is some random text <3> foo <12>"

我需要将以下子字符串放在数组中:

myArray[0] = "hello world ";
myArray[1] = "<1>";
myArray[2] = " this is some random text ";
myArray[3] = "<3>";
myArray[4] = " foo ";
myArray[5] = "<12>";

请注意,只要遇到&lt;“number”&gt;,我就会分割字符串。序列

我尝试将字符串与常规表达式/<\d{1,3}>/分开但是当我这样做时,我松开了&lt;“数字”&gt;序列。换句话说,我最终得到了“hellow world”,“这是一些随机文本”,“foo”。请注意,我松开字符串“&lt; 1&gt;”,“&lt; 3&gt;”和“&lt; 12&gt;”我想保留它。我怎么能解决这个问题?

1 个答案:

答案 0 :(得分:11)

您需要捕获序列以保留它。

var str = "hello wold <1> this is some random text <3> foo <12>"

str.split(/(<\d{1,3}>)/);

// ["hello wold ", "<1>", " this is some random text ", "<3>", " foo ", "<12>", ""]

如果某些浏览器中的捕获组存在问题,您可以手动执行此操作:

var str = "hello wold <1> this is some random text <3> foo <12>",    
    re = /<\d{1,3}>/g,
    result = [],
    match,
    last_idx = 0;

while( match = re.exec( str ) ) {
   result.push( str.slice( last_idx, re.lastIndex - match[0].length ), match[0] );

   last_idx = re.lastIndex;
}
result.push( str.slice( last_idx ) );