Question

例如，

good flavor

当我将此字符串复制到Java代码时，它会自动变为：

String str = EmojiParser.removeAllEmojis("good flavor\uD83D\uDC4D\uD83C\uDffB):

我在IntelliJ工作。复制后自动转换的字符串可以在我使用的外部库（stanford parser-chinese）中传递我的代码。

但是，如果我直接从文件中读入并应用上面相同的功能，它只会转换前两个代码＆＃34; \ uD83D \ uDC4D＆＃34;。其余的＆＃34; \ u83C \ uDffB＆＃34;没有被函数删除，因此我的代码在库中失败了。

在Java中，如何将此复合表情符号转换为unicode，以便我的代码可以安全通过？如果它可以转换成与我复制到代码中的格式相同的格式就足够了。

事实上，不仅表情符号，其他一些非标准的unicode字符可能会有相同的问题，而且我正在处理亚洲语言。

修改

public List<CoreMap> annotate(@Nonnull final String document) {
            Annotation annotation = new Annotation(EmojiParser.removeAllEmojis(document));
            this.annotate(annotation);
            List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
            if(sentences==null){
                return new ArrayList<CoreMap>();
            }
            return sentences;
        }

在此代码中，对于上面的String，＆＃39; annotation.get（）＆＃39;将返回＆＃34; null＆＃34;，这不应该这样做。所以我需要预先格式化＆＃39;文档＆＃39;字符串在发送到新的Annodation（）＆＃39;之前方法

编辑2：从文件中读取数据：

BufferedReader br = null;
        List<Document> processedDocs = new ArrayList<>();
        try {
            if (location == DataLocation.relativeToResources) {
                br = new BufferedReader(IOUtils.fileReaderAsResource(filePath));
            } else if (location == DataLocation.relativeToRoot) {
                br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath)));
            } else {
                throw new RuntimeException("Unknown data location! " + location);
            }
            String line;
            int docNo = 0;
            List<Future<Document>> tasks = new ArrayList<>();
            while ((line = br.readLine()) != null) {
                line = line.trim();
                if (line.isEmpty()) {
                    continue;
                }
                Callable<Document> callable = null;
                callable = new NLPThread(line, filePath, ++docNo, docSource);

                if (callable != null) {
                    tasks.add(s_pool.submit(callable));
                }
            }

如何将复杂的表情符号转换为unicde？

0 个答案: