Question

我试图挑选整句话（我们引用如下：在三十年来最大的黄金衰退让投资者伤心欲绝之后，他们遵循泰勒斯威夫特的建议而永远不会再回到一起）使用下面提到的代码。我可以在黄金出现后提取字符串，但不能提取前面的字符。

这是一个例子我只是想创建一个代码来从文本中选择句子，无论它出现在文本的开头，文本的末尾还是文本的中间

我想如果可能的话，我可以在（。）到（。）之间选择文本

var str = "This rally in gold will fail!  The consensus on this    market view is as great as we saw at the beginning of the year on strong economic growth and rising interest rates!  Bloomberg captured the sentiment well in a June 24th article.  We quote as follows:   After the biggest gold slump in three decades left investors heartbroken, they\u2019re following Taylor Swift\u2019s advice and never, ever getting back together.; 
var n = str.indexOf("Swift");
var res = str.substr(n, str.indexOf(".")-n);

Answer 1

通常我会使用正则表达式执行此操作，如下所示：

(?<=.\s+|^)[^.]*Swift[^.]*\.

这基本上意味着»尽可能多地获取不是句号（[^.]）的字符（之后的*）;某处必须有> Swift <。在此之前还必须有文本的开头或另一个句子结束（(?<=.\s+|^)）。它必须以句号结束（\.）。«

然而，这在JavaScript中不起作用，因为没有lookbehind，其中至少有任意长度（在完全停止后占用空格）。您可以做的最好的事情是匹配前一句的结尾并在之后删除它，或者只是使用捕获组来表示您感兴趣的部分：

(?:.\s+|^)([^.]*Swift[^.]*\.)

真的是关于模式的。你想要一个包含»Swift«的句子。这样的句子包括要查找的单词之前的部分（可能为空），以及要查找的单词之后的部分（可能为空）。它也以.结尾。如果从这个角度解决问题，转换为正则表达式实际上相当简单，如上所示。

事实上，我们甚至可以做得更好，因为我们知道正则表达式如何与匹配项一起使用：

[^.]*Swift[^.]*\.

也应该足够了。部分[^.]*将永远不会与.匹配，因此它无法在前一句话中开始。因此，匹配可以开始的第一个可能位置是在包含要搜索的单词的句子的开头。顺便说一句，这个正则表达式也应该在JavaScript中工作。

使问题复杂化的是，句子也不仅以句号结束，而且有时也带有感叹号或问号，正如Soana正确地指出的那样（我在阅读问题时考虑到了这一点，但在撰写正则表达式 - 短期记忆很有趣）。所以正则表达式应该看起来像这样：

[^.!?]*Swift[^.!?]*\.

Answer 2

 var str = "This rally in gold will fail!  The consensus on this    market      view            is as great as we saw at the beginning of the year on strong economic growth and rising interest rates!  Bloomberg captured the sentiment well in a June 24th article.  We quote as follows:   After the biggest gold slump in three decades left investors heartbroken, they\u2019re following Taylor Swift\u2019s advice and never, ever getting back together."; 

 var a=str.split('.');
 for(var i=0;i<a.length;i++){
 if(a[i].indexOf("Swift")>-1)
    console.log(a[i]);
}

提取匹配单词存在的句子

2 个答案: