在阅读和学习正则表达式的同时,我一直在努力弄清楚为什么我在使用正则表达式时会出错?
我的字符串是
String sentence = "I would've rather stayed at home, than go to the Murphys' home, on the 'golden' weekend";
我当前使用的replaceAll参数是:
String[] tokens = sentence.replaceAll("[^\\sA-Za-z']+", "").split("\\s+");
这给了我一系列类似
的令牌tokens = {"I", "__would've__", "rather", "stayed", "at", "home", "than", "go", "to", "the", "__Murphys'__", "home", "on", "the", "__'golden'__", "weekend"};
但是我想将单引号从 Murphys 删除为 Murphys ,将'golden'删除为 golden 而将会保持为将会。
给我一个看起来像
的数组correctTokens = {"I", "__would've__", "rather", "stayed", "at", "home", "than", "go", "to", "the", "__Murphys__", "home", "on", "the", "__golden__", "weekend"};
非常感谢您的帮助
答案 0 :(得分:0)
使用replaceAll("[^\\h\\v\\p{L}']+|(?<=\\P{L}|^)'|'(?=\\P{L}|$)", "")
说明:
[^\h\v\p{L}']+
一个或多个以下字符:
'
|
或(?<=\P{L}|^)'
撇号前面是非字母或输入的开头|
或'(?=\P{L}|$)
撇号后跟非字母或输入结尾有关演示,请参见regex101.com。
答案 1 :(得分:0)
尝试正则表达式:\\s'|'\\s
并替换为空格
String sentence = "I would've rather stayed at home, than go to the Murphys' home, on the 'golden' weekend";
String[] tokens = sentence.replaceAll("\\s'|'\\s", " ").split("\s+");
输出
[I, would've, rather, stayed, at, home,, than, go, to, the, Murphys, home,, on, the, golden, weekend]