Question

我正在尝试匹配单个句子中的多个引号，例如行：

Hello "this" is a "test" example.

这是我正在使用的正则表达式，但我遇到了一些问题：

/[^\.\?\!\'\"]{1,}[\"\'\“][^\"\'\“\”]{1,}[\"\'\“\”][^\.\?\!]{1,}[\.\?\!]/g

我正在尝试用这个正则表达式实现的是从最后一句开始直到我点击引号找到所有内容，然后找到结束集并继续直到.?!

我用来测试的示例文本来自Cthulhu的召唤：

似乎是主要文件的主题是“CTHULHU CULT”，其中的字符经过精心打印，以避免错误地阅读一个如此闻所未闻的单词。该手稿分为两部分，第一部分是“1925年 - HA Wilcox的梦想和梦想工作，7 Thomas St.，Providence，RI”，第二部分是“检查员John Legrasse的叙事，121 Bienville St.，New Orleans，La。，at 1908 AAS Mtg.- Notes on Same，＆amp;韦伯教授的Acct。“其他手稿都是简短的笔记，其中一些描述了不同人的奇怪梦想，其中一些是神智学书籍和杂志的引用。

问题出现在The manuscript was...行。有谁知道如何解释这样的重复？或者有更好的方法吗？

Answer 1

这个忽略引号内的[。？！]。但在这种情况下，Acct.” The nth等案例将被视为单句。那边可能缺少.。

＆＃13;

var r = 'What seemed to be the main document was headed “CTHULHU.?! CULT” in characters painstakingly printed to avoid the erroneous reading of a word so unheard-of. The manuscript was divided into two sections, the first of which was headed “1925—Dream and Dream Work of H. A. Wilcox, 7 Thomas St., Providence, R.I.”, and the second, “Narrative of Inspector John R. Legrasse, 121 Bienville St., New Orleans, La., at 1908 A. A. S. Mtg.—Notes on Same, & Prof. Webb’s Acct.” The other manuscript papers were all brief notes, some of them accounts of the queer dreams of different persons, some of them citations from theosophical books and magazines.'
.split(/[“”]/g)
.map((x,i)=>(i%2)?x.replace(/[.?!]/g,''):x)
.join("'")
.split(/[.?!]/g)
.filter(x => x.trim()).map(x => ({
  sentence: x,
  quotescount: x.split("'").length - 1
}));

console.log(r);

＆＃13;

Answer 2

你可以使用这种天真的模式：

/[^"'“.!?]*(?:"[^"*]"[^"'“.!?]*|'[^']*'[^"'“.!?]*|“[^”]*”[^"'“.!?]*)*[.!?]/

细节：

/
[^"'“.!?]*          # all that isn't a quote or a punct that ends the sentence
(?:
    "[^"*]" [^"'“.!?]* 
  |
    '[^']*' [^"'“.!?]*
  |
    “[^”]*” [^"'“.!?]*
)*
[.!?]
/

如果你想要更强大的东西，你可以模仿“原子分组”功能，特别是如果你不确定每个开头报价都有一个结束报价（以防止灾难性的回溯）：

/(?=([^"'“.!?]*))\1(?:"(?=([^"*]))\2"[^"'“.!?]*|'(?=([^']*))\3'[^"'“.!?]*|“(?=([^”]*))\4”[^"'“.!?]*)*[.!?]/

一个原子组禁止一旦关闭后退。不幸的是，Javascript中不存在此功能。但有一种方法可以使用自然原子，捕获组和反向引用的前瞻来模拟它：

(?>expr) => (?=(expr))\1

在句子中匹配多个引号

2 个答案: