Question

在我的软件中，我收到一个包含一些HTML实体的xml文件，例如＆amp;放大器;管他呢。我成功解码了xml而不是HTML实体。字符串在遇到html实体时会被切断...任何人都可以提供帮助吗？我实际上有这样的代码来解码xml ...

            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
   DocumentBuilder builder = factory.newDocumentBuilder();
InputStream inputStream = entity.getContent();
Document dom = builder.parse(inputStream);
   inputStream.close();


   Element racine = dom.getDocumentElement();
   NodeList nodeLst=racine.getElementsByTagName("product");

有谁知道我可以做同样的工作，将xml解码为dom对象并解码HTML实体？

实际上我的dom对象不正确，因为它包含一些由于HTML实体而被剪切的字符串......我该怎么办？

Answer 1

我有两种建议方法：

停用验证：factory.setValidating(false);
在<?xml ...>标记后立即向您的XML流添加XHTML DTD标记。

＆lt;？xml version =“1.0”＆gt; ＆lt;！DOCTYPE html PUBLIC“ - // W3C // DTD XHTML 1.0 Transitional // EN”“http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”＆gt;

Answer 2

我认为它是因为它检测到"'"撇号作为字符串的最后一个。我已经找到了解决方案。

String stringDatosEntrada = new Scanner(urlConnection.getInputStream()).useDelimiter("\\A").next().replaceAll("&amp;#39;","\'").replaceAll("&#39;","\'");

InputStream is = new ByteArrayInputStream(stringDatosEntrada.getBytes());
Document dom = builder.parse(inputStream)

Answer 3

您可以尝试使用androids Html tag editor。它应该做你想要的，它不会识别所有 HTML，但它似乎可以转换字符串：

    Html.fromHtml(inputstream)

这是一个简单的例子：

    TextView tv = (TextView) findViewById(R.id.tv);
    String s = "<b>This is</b> my first <u>HTML String</u> &amp; it works well!";
    tv.setText(Html.fromHtml(s));

这是输出：

Android解码xml文件中的html

3 个答案: