是否可以使用Jsoup 1.8.1将HTML转换为XHTML?

时间:2015-03-16 21:17:09

标签: java html xhtml jsoup

String body = "<br>";
Document document = Jsoup.parseBodyFragment(body);
document.outputSettings().escapeMode(EscapeMode.xhtml);
String str = document.body().html();
System.out.println(str);

期待:<br />

结果:<br>

Jsoup可以将值HTML转换为XHTML吗?

3 个答案:

答案 0 :(得分:27)

请参阅Document.OutputSettings.Syntax.xml

private String toXHTML( String html ) {
    final Document document = Jsoup.parse(html);
    document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);    
    return document.html();
}

答案 1 :(得分:7)

您应该告诉您希望将该字符串保留为HTML或XML格式。

public String parserXHtml(String html) {
        org.jsoup.nodes.Document document = Jsoup.parseBodyFragment(html);
        document.outputSettings().syntax(org.jsoup.nodes.Document.OutputSettings.Syntax.xml); //This will ensure the validity
        document.outputSettings().charset("UTF-8");
        return document.toString();
    }

答案 2 :(得分:2)

您可以使用JTidy API执行此操作。使用jtidy-r938.jar

您可以使用以下方法从html获取xhtml

public static String getXHTMLFromHTML(String inputFile,
            String outputFile) throws Exception {

        File file = new File(inputFile);
        FileOutputStream fos = null;
        InputStream is = null;
        try {
            fos = new FileOutputStream(outputFile);
            is = new FileInputStream(file);
            Tidy tidy = new Tidy(); 
            tidy.setXHTML(true); 
            tidy.parse(is, fos);
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }finally{
            if(fos != null){
                try {
                    fos.close();
                } catch (IOException e) {
                    fos = null;
                }
                fos = null;
            }
            if(is != null){
                try {
                    is.close();
                } catch (IOException e) {
                    is = null;
                }
                is = null;
            }
        }

        return outputFile;
    }