应用错误收集

我在一些文本元素中有一个带有html标记的XML文档，如下所示：

<my-element><p>This is an XML element<br/>
with HTML markup and chemical formulas <br/>
like water H<sub>2</sub>O, scientific notation like 1.32 x 10<sup>4</sup>, and other super- and <br/>
sub-script c<sub>h</sub><sup>a</sup><sub>r</sub><sup>a</sup>c<sub>t</sup><sub>e</sub><sup>r</sup><sub>s</sub> <sup>i</sup><sup>n</sup> Unicode.</p></my-element>

我正在使用lxml的xtree。解析器有两个xml和html模式，但我还没有找到一种方法来解析整个文本字符串（段落）以呈现为这样的Unicode字符：

This is an element
with HTML markup and chemical formulas 
like water H₂O, scientific notation like 1.32 x 10³, and all super- and 
sub-script cₕₐᵣₐcₜₑᵣₛ ⁱⁿ Unicode.

是否有其他图书馆可以提供帮助？

如何使用Python将html标记解析为Unicode字符

0 个答案: