Question

我的句子如：

“[巴黎：位置]和[里昂：位置]在法国”

我需要从中提取所有标记的部分（“巴黎：位置”和“里昂：位置”）。

我尝试过使用正则表达式（RegExp）的代码：

var regexEntity = new RegExp('\[.+:.+\]', 'g');

var text = '[Paris:location] and [Lyon:location] are in France';
while ((match = regexEntity.exec(text))) {
    console.log(match);
}

但这是我得到的输出，好像是检测到冒号：

[ ':',
  index: 6,
  input: '[Paris:location] and [Lyon:location] are in France' ]
[ ':',
  index: 26,
  input: '[Paris:location] and [Lyon:location] are in France' ]

我的正则表达式有问题吗？您用来获取该信息的任何其他方法？

Answer 1

.+贪婪，你需要使用它的懒惰版本：.+?。

然后，这很简单：

var text = '[Paris:location] and [Lyon:location] are in France';
console.log(text.match(/\[.+?:.+?\]/g));

Answer 2

您可以使用带有非惰性搜索和正向前瞻的正则表达式。

＆＃13;

var regex = /\[(.*?)(?=:location)/gi,
    string = '"[Paris:location] and [Lyon:location] are in France"',
    match;
 
while ((match = regex.exec(string)) !== null) {
    console.log(match[1]);
}

＆＃13;

从javascript中的字符串中提取半结构化信息

2 个答案: