如何删除单词开头和结尾的撇号,而不删除单词内部的撇号?

时间:2018-10-18 18:01:16

标签: java regex

在阅读和学习正则表达式的同时,我一直在努力弄清楚为什么我在使用正则表达式时会出错?

我的字符串是

String sentence = "I would've rather stayed at home, than go to the Murphys' home, on the 'golden' weekend";

我当前使用的replaceAll参数是:

String[] tokens = sentence.replaceAll("[^\\sA-Za-z']+", "").split("\\s+");

这给了我一系列类似

的令牌
tokens = {"I", "__would've__", "rather", "stayed", "at", "home", "than", "go", "to", "the", "__Murphys'__", "home", "on", "the", "__'golden'__", "weekend"};

但是我想将单引号从 Murphys 删除为 Murphys ,将'golden'删除为 golden 将会保持为将会

给我一​​个看起来像

的数组
correctTokens = {"I", "__would've__", "rather", "stayed", "at", "home", "than", "go", "to", "the", "__Murphys__", "home", "on", "the", "__golden__", "weekend"};

非常感谢您的帮助

2 个答案:

答案 0 :(得分:0)

使用replaceAll("[^\\h\\v\\p{L}']+|(?<=\\P{L}|^)'|'(?=\\P{L}|$)", "")

说明:

  • [^\h\v\p{L}']+一个或多个以下字符:
    • Unicode(水平或垂直)空格
    • Unicode字母
    • 撇号'
  • |
  • (?<=\P{L}|^)'撇号前面是非字母或输入的开头
  • |
  • '(?=\P{L}|$)撇号后跟非字母或输入结尾

有关演示,请参见regex101.com

答案 1 :(得分:0)

尝试正则表达式:\\s'|'\\s并替换为空格

String sentence = "I would've rather stayed at home, than go to the Murphys' home, on the 'golden' weekend";

String[] tokens = sentence.replaceAll("\\s'|'\\s", " ").split("\s+");

输出

[I, would've, rather, stayed, at, home,, than, go, to, the, Murphys, home,, on, the, golden, weekend]