My application is using Spring Integration for email polling from Outlook mailbox.
As, it is receiving the String (email body)from an external system (Outlook), So I have no control over it.
For Example,
String emailBodyStr= "rejected by sundar14-\u200B.";
Now I am trying to remove the unicode character \u200B from this String.
What I tried already.
Try#1:
emailBodyStr = emailBodyStr.replaceAll("\u200B", "");
Try#2:
`emailBodyStr = emailBodyStr.replaceAll("\u200B", "").trim();`
Try#3 (using Apache Commons):
StringEscapeUtils.unescapeJava(emailBodyStr);
Try#4:
StringEscapeUtils.unescapeJava(emailBodyStr).trim();
Nothing worked till now.
When I tried to print this String using below code.
logger.info("Comment BEFORE:{}",emailBodyStr);
logger.info("Comment AFTER :{}",emailBodyStr);
In Eclipse console, it is NOT printing unicode char,
Comment BEFORE:rejected by sundar14-.
But the same code prints the unicode char in Linux console as below.
Comment BEFORE:rejected by sundar14-\u200B.
I read some examples where str.replace() is recommended, but please note that examples uses javascript, PHP and not Java.
答案 0 :(得分:7)
最后,我可以删除' Zero Width Space'使用' Unicode Regex'。
String plainEmailBody = new String();
plainEmailBody = emailBodyStr.replaceAll("[\\p{Cf}]", "");
参考以查找Unicode字符的类别。
Character类列出了所有这些unicode类别。
注1:当我从 Outlook电子邮件正文收到此字符串时,无我的问题中列出的方法正在运行。
我的应用程序正在从外部系统接收字符串 ( Outlook ),所以我无法控制它。
注2:此SO answer帮助我了解 Unicode正则表达式。