在Java中将表情符号转换为HTML十进制代码或Unicode十六进制代码

时间:2017-02-09 20:33:09

标签: java html html-entities emoji html-encode

我正在尝试将带有表情符号内容的文本文件转换为使用表情符号的HTML代码或使用Java的Hex代码的文件。 例如:

I / p:<div id="thread" style="white-space: pre-wrap;"><div>⚽️

预期o / p:<div id="thread" style="white-space: pre-wrap;"><div>😀😀😃🍎🍏⚽️🏀

在上面的输出''应该更改为相应的html实体代码'& # 128512;'

这里给出了Html实体代码和十六进制代码的详细信息: http://character-code.com/emoticons-html-codes.php

我试过的示例代码如下:

try {
            File file = new File("/inFile.txt");
            str = FileUtils.readFileToString(file, "ISO-8859-1");
            System.out.println(new String(str.getBytes(), "UTF-8"));
            String results = StringEscapeUtils.escapeHtml4(str);
            System.out.println(results);
        } catch (IOException e) {
            e.printStackTrace();
        }

1 个答案:

答案 0 :(得分:0)

I got the work around :
public static void htmlDecimalCodeGenerator () {

  DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();

  domFactory.setValidating(false);

   // File inputFile = new File("/inputFile.xml");
   File inputFile = new File("/inputFile.xml");



   try {

  FileOutputStream fop = null;

  File OutFile = new File("/outputFile.xml");

  fop = new FileOutputStream(OutFile);



  DocumentBuilder builder = domFactory.newDocumentBuilder();

  Document doc = builder.parse(inputFile);



  TransformerFactory tf = TransformerFactory.newInstance();

  Transformer transformer = tf.newTransformer();



   /*
  no value of OMIT_XML_DECLARATION will add following xml declaration in the beginning of the file.
  <?xml version='1.0' encoding='UTF-32'?>
  */
   transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");



   /*

  When the output method is "xml", the version value specifies the
  version of XML to be used for outputting the result tree. The default
  value for the xml output method is 1.0. When the output method is
  "html", the version value indicates the version of the HTML.
  The default value for the xml output method is 4.0, which specifies
  that the result should be output as HTML conforming to the HTML 4.0
  Recommendation [HTML]. If the output method is "text", the version
  property is ignored
  */
   transformer.setOutputProperty(OutputKeys.METHOD, "xml");



   /*
  Indent-- specifies whether the Transformer may
  add additional whitespace when outputting the result tree; the value
  must be yes or no.
  */
   transformer.setOutputProperty(OutputKeys.INDENT, "no");





  transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");

   // transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");

   transformer.transform(new DOMSource(doc),

   new StreamResult(new OutputStreamWriter(System.out, "UTF-8")));

   // new StreamResult(new OutputStreamWriter(fop, "UTF-8")));


   } catch (Exception e) {

  e.printStackTrace();

  }

}

}