Question

我正在使用JTidy（HTML Tidy库的java端口）来清理一些现有网站。当我使用我的JTidy配置时似乎非常严格，最终会切断页面的底部（坏标记）。

当我仅通过w3c HTML验证工具运行相同的标记时，它会清理它，但在重写时更加智能;它不是砍掉标签，而是巧妙地猜测丢失标签的位置并相应地更新结构。

有谁知道w3c使用的HTML-Tidy配置？

我的jtidy配置如下：

    Tidy tidy = new Tidy();
    tidy.setTidyMark(false);
    tidy.setXHTML(true);
    tidy.setXmlOut(false);
    tidy.setNumEntities(true);        
    tidy.setSpaces(2);
    tidy.setWraplen(2000);
    tidy.setUpperCaseTags(false);
    tidy.setUpperCaseAttrs(false);
    tidy.setQuiet(false);
    tidy.setMakeClean(true);
    tidy.setShowWarnings(true);
    tidy.setBreakBeforeBR(true);
    tidy.setHideComments(true);

Answer 1

W3C验证器使用的整洁配置可用here

在w3c HTML Validator上使用的JTidy（HTML-Tidy）配置

1 个答案: