假设有一个表情符号的十六进制字符串,如" 1f1e81f1f3
",它是表情符号的代码点的不正确的十六进制字符串,并且&# 39;应该是1f1e8
1f1f3
我使用org.apache.commons.codec.binary.Hex
来解码十六进制字符串,但显然Hex需要输入字符串的长度是偶数,所以我需要将十六进制字符串设置为零填充样式,如" 0 1f1e8
的 0 1f1f3
"
目前,我只需更换" 1f"与" 01f",到目前为止一直很好,但自an emoji glyph may contains a sequence of unicode characters以来,所以
这个表情符号的十六进制字符串从&#34; <span class="emoji emojiXXXXXXXXXX"></span>
&#34; string,它是通过非官方HTTP API从流行的IM软件中检索的文本消息。
答案 0 :(得分:0)
我最终写了一个小函数来恢复表情符号字符。
基本程序:
1f
”开头,则在“1f
”之前填充三个零,将其存储到新的十六进制字符串,然后指针指向下一个第五个位置。否则,不会进行零填充,将子字符串存储到新的十六进制字符串,并指针指向下一个第四个位置。它有效,但它并不完美,如果
,它可能会引入错误1f
”开头,或者十六进制字符串的长度不是5。代码段:
import java.util.*;
import java.util.regex.*;
import org.apache.commons.codec.*;
import org.apache.commons.codec.binary.Hex;
import org.apache.commons.lang3.*;
public static final Charset UTF_32BE = Charset.forName ("UTF-32BE");
public static final String REGEXP_FindTransformedEmojiHexString = "<span class=\"emoji emoji(\\p{XDigit}+)\"></span>";
public static final Pattern PATTERN_FindTransformedEmojiHexString = Pattern.compile (REGEXP_FindTransformedEmojiHexString, Pattern.CASE_INSENSITIVE);
public static String RestoreEmojiCharacters (String sContent)
{
bMatched = true;
String sEmojiHexString = matcher.group(1);
Hex hex = new Hex (StandardCharsets.ISO_8859_1);
try
{
for (int i=0; i<sEmojiHexString.length ();)
{
String sEmoji = null;
Charset charset = null;
String sSingleEmojiGlyphHexString = null;
String sStartString = StringUtils.substring (sEmojiHexString, i, i+2);
if (StringUtils.startsWithIgnoreCase (sStartString, "1f"))
{
sSingleEmojiGlyphHexString = "000" + StringUtils.substring (sEmojiHexString, i, i+5);
i += 5;
charset = UTF_32BE;
}
else
{
sSingleEmojiGlyphHexString = StringUtils.substring (sEmojiHexString, i, i+4);
i += 4;
charset = StandardCharsets.UTF_16BE;
}
byte[] arrayEmoji = null;
arrayEmoji = (byte[])hex.decode (sSingleEmojiGlyphHexString);
sEmoji = new String (arrayEmoji, charset);
matcher.appendReplacement (sbReplace, sEmoji);
}
}
catch (DecoderException e)
{
e.printStackTrace();
}
}
matcher.appendTail (sbReplace);
if (bMatched)
sContent = sbReplace.toString ();
return sContent;
}