Question

我正试图从一个段落中提取句子，其格式如

 Current. time is six thirty at Scotland. Past. time was five thirty at India; Current. time is five thirty at Scotland. Past. time was five thirty at Scotland. Current. time is five ten at Scotland.

当我使用正则表达式

时

/current\..*scotland\./i

这匹配所有字符串

Current. time is six thirty at Scotland. Past. time was six thirty at India; Current. time is five thirty at Scotland. Past. time was five thirty at Scotland. Current. time is five ten at Scotland.

相反，我想在第一次出现＆＃34;时停止。＆＃34;对所有捕获组如

 Current. time is six thirty at Scotland.
 Current. time is five ten at Scotland.

类似于

之类的文字

 Past. time was five thirty at India; Current. time is six thirty at Scotland. Past. time was five thirty at Scotland. Past. time was five ten at India;

当我使用正则表达式

时

 /past\..*india\;/i

此匹配将整个字符串

 Past. time was five thirty at India; Current. time is six thirty at Scotland. Past. time was five thirty at Scotland. Past. time was five ten at India;

在这里，我想捕捉所有群组或第一组如下，以及如何在第一次出现时停止＆＃34;;＆＃34;

Past. time was five thirty at India; 
Past. time was five ten at India;

如何让正则表达式停留在＆＃34;，＆＃34;或＆＃34;;＆＃34;以上例子？

Answer 1

有一些你不应该用你的正则表达式做的事情，首先，正如Arnal Murali指出的那样，你不应该使用贪婪的正则表达式但是应该使用懒惰的版本：

/current\..*?scotland\./i

我认为首先使用正则表达式是一种常规规则，因为它通常是你想要的。其次，您真的不想使用.来匹配所有内容，因为您不希望允许正则表达式的这一部分与.或;匹配，您可以将其放入一个负捕获组来捕获除它们之外的任何东西：

/current\.[^.]*?scotland\./i

和

/current\.[^;]*?india;/i

或覆盖两者：

/(current|past)\.[^.;]*?(india|scotland)[.;]/i

（显然这可能不是你想要做的，只是包括演示如何扩展它）

这也是一个很好的经验法则，如果您在使用正则表达式时遇到问题，请将任何通配符更具体（在这种情况下，从匹配所有内容.更改为匹配除.和{;之外的所有内容{1}}与[^.;]）

Answer 2

正如Amal所说，你的模式是贪婪的，你应该附加一个？让它变得懒惰。我将使用以下内容来获取您要求的第一个字符串：

/^.*?current\..*?scotland\./i

这样可以让每个小组都遵循这种模式，同时考虑到';'以及'。'：

/current\..*?scotland[.;]/i

这最后一个基本上意味着：找到任何'当前'的出现，当你到达第一个'苏格兰'后跟'a'时停止。或者';'

Answer 3

s = ""Current. time is six thirty at Scotland. Past. time..."
s.scan /[Current|Past]*\..*?[.|;]/i 

#=> ["Current. time is six thirty at Scotland.", "Past. time was five thirty at India;",...]

希望Regex在第一次出现“。”时停止。和“;”

3 个答案: