Question

很抱歉，如果以前曾经问过这个，但我想从这样的字符串中获取一系列单词：

"Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\"."

该数组应该如下所示：

[
  "exclamation",
  "question",
  "quotes",
  "apostrophe",
  "wasn't"
  "couldn't",
  "didn't"
]

目前我正在使用这个表达式：

sentence.toLowerCase().replace(/[^\w\s]/gi, "").split(" ");

问题是，它从“不是”这样的词中删除了撇号，把它变成“不是”。

我无法弄清楚如何用这样的词来保留撇号。

非常感谢任何帮助！

var sentence = "Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\".";
console.log(sentence.toLowerCase().replace(/[^\w\s]/gi, "").split(" "));

Answer 1

解决自己的解决方案会很棘手，但你可以这样考虑撇号：

＆＃13;

sentence = `"Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\"."`;
console.log(
    sentence.match(/\w+(?:'\w+)*/g)
);

＆＃13;

注意：将量词从?更改为*，以允许单词中的多个'。

Answer 2

@ revo的答案看起来不错，这是另一个应该也适用的选项：

const input = "Exclamation! Question? \"Quotes.\" 'Apostrophe'. Wasn't. 'Couldn't'. \"Didn't\".";
console.log(input.toLowerCase().match(/\b[\w']+\b/g));

说明：

\b匹配单词的开头/结尾，
[\w']+匹配任何字母，数字，下划线或引号（省略下划线，您可以使用[a-zA-Z0-9']），
/g告诉正则表达式捕获与该模式匹配的所有匹配项（不仅仅是第一个）。

Javascript：删除字符串标点并拆分成单词？

2 个答案: