Question

在阅读和学习正则表达式的同时，我一直在努力弄清楚为什么我在使用正则表达式时会出错？

我的字符串是

String sentence = "I would've rather stayed at home, than go to the Murphys' home, on the 'golden' weekend";

我当前使用的replaceAll参数是：

String[] tokens = sentence.replaceAll("[^\\sA-Za-z']+", "").split("\\s+");

这给了我一系列类似

的令牌

tokens = {"I", "__would've__", "rather", "stayed", "at", "home", "than", "go", "to", "the", "__Murphys'__", "home", "on", "the", "__'golden'__", "weekend"};

但是我想将单引号从 Murphys 删除为 Murphys ，将'golden'删除为 golden 而将会保持为将会。

给我一个看起来像

的数组

correctTokens = {"I", "__would've__", "rather", "stayed", "at", "home", "than", "go", "to", "the", "__Murphys__", "home", "on", "the", "__golden__", "weekend"};

非常感谢您的帮助

Answer 1

使用replaceAll("[^\\h\\v\\p{L}']+|(?<=\\P{L}|^)'|'(?=\\P{L}|$)", "")

说明：

[^\h\v\p{L}']+一个或多个以下字符：
- Unicode（水平或垂直）空格
- Unicode字母
- 撇号'
|或
(?<=\P{L}|^)'撇号前面是非字母或输入的开头
|或
'(?=\P{L}|$)撇号后跟非字母或输入结尾

有关演示，请参见regex101.com。

Answer 2

尝试正则表达式：\\s'|'\\s并替换为空格

String sentence = "I would've rather stayed at home, than go to the Murphys' home, on the 'golden' weekend";

String[] tokens = sentence.replaceAll("\\s'|'\\s", " ").split("\s+");

输出

[I, would've, rather, stayed, at, home,, than, go, to, the, Murphys, home,, on, the, golden, weekend]

如何删除单词开头和结尾的撇号，而不删除单词内部的撇号？

2 个答案: