按空格拆分Javascript,除非引号之间 - 但不匹配引号本身

时间:2016-06-16 17:57:19

标签: javascript regex

这是一个常见问题的轻微变体:你如何用空格分割字符串,除非这个空格包含在一对引号中(“或”)?这里有很多这样的问题,而且我到目前为止找到的最佳答案是this one。问题是,所有这些答案都包括匹配中的引号本身。例如:

"foo bar 'i went to a bar'".match(/[^\s"']+|"([^"]*)"|'([^']*)'/g);

结果:

["foo", "bar", "'i went to a bar'"]

是否有解决方案导致:

["foo", "bar", "i went to a bar"]

请注意,有一个边缘情况:

"foo bar \"'Hi,' she said, 'how are you?'\"".match(...);
=> // ["foo", "bar", "'Hi,' she said, 'how are you?'"]

也就是说,子字符串应该能够包含它自己的引用,这意味着积极地做这样的事情是行不通的:

"foo bar \"'Hi,' she said, 'how are you?'\"".match(...).map(function(string) {
  return string.replace(/'|"/g, '');
});

更新

我们基本上可以使用它:

"foo bar \"'Hi,' she said, 'how are you?'\"".match(/[^\s"']+|"([^"]*)"|'([^']*)'/g).map(function(string) {
    return string.replace(/^('|")|('|")$/g, '');
});

但那太难看了。 (而且它也会破坏像“5英尺5英尺5英尺”这样的边缘情况。)必须有办法将它缩小到一个正则表达式,对吗?

2 个答案:

答案 0 :(得分:2)

你的正则表达式足够好了。您只需循环匹配并选择正确捕获的组

var re = /'([^'\\]*(?:\\.[^'\\]*)*)'|"([^"\\]*(?:\\.[^"\\]*)*)"|[^\s"']+/g;
var arr = ['foo bar "\'Hi,\' she said, \'how are you?\'"',
  'foo bar \'i went to a bar\'',
  'foo bar \'"Hi," she said, "how are you?"\'',
  '\'"Hi," she \\\'said\\\', "how are you?"\''
];

for (i = 0; i < arr.length; i++) {
  var m;
  var result = [];
  while ((m = re.exec(arr[i])) !== null) {
    if (m.index === re.lastIndex)
      re.lastIndex++;
    result.push(m[1] || m[2] || m[0])
  }
  console.log(result)
}

答案 1 :(得分:1)

带引号的字符串总是很有趣。您需要测试偶数或奇数个转义字符,以了解何时终止该字符串。

function quotedSplit(str) {
    let re = /'((?:(?:(?:\\\\)*\\')|[^'])*)'|"((?:(?:(?:\\\\)*\\")|[^"])*)"|(\w+)/g,
        arr = [],
        m;
    while(m = re.exec(str))
        arr.push(m[1] || m[2] || m[3]);

    return arr;
}

quotedSplit("fizz 'foo \\'bar\\'' buzz" + ' --- ' + 'fizz "foo \\"bar\\"" buzz');
// ["fizz", "foo \'bar\'", "buzz", "fizz", "foo \"bar\"", "buzz"]

在这里,前两个匹配将找到引用的字符串,第三个匹配是&#34;字&#34;