Question

<div style="font-size:20px;font-family:arial;x:5;color:red;">Test</div><span/>

<span />它的空白和PDF异常是因为。

var elements = HTMLWorker.ParseToList(reader, style, null);

enter image description here

Answer 1

span有一个开始和结束标记，并且斜杠位于错误的位置，错误表示如果找到了除尖括号之外的其他内容，

 <span [HERE] >

你目前有

<span />

Answer 2

在评论中，您说您知道<span></span>是正确的，但您面对的是现有代码。

在我的评论中，我提到了我对这个问题的回答：How to do HTML to XML conversion to generate closed tags?

在这个问题中，我解释说运行＆＃34;糟糕的HTML＆＃34;是一种好习惯。在将HTML提供给XML Worker之前通过jsoup。我在您的代码段中使用了D00_HTML示例中的代码。

此：

<div style="font-size:20px;font-family:arial;x:5;color:red;">Test</div><span/>

转换成了这个：

<html>
 <head></head>
 <body>
  <div style="font-size:20px;font-family:arial;x:5;color:red;">
   Test
  </div>
  <span></span>
 </body>
</html>

使用此代码：

public static void tidyUp(String path) throws IOException {
    File html = new File(path);
    byte[] xhtml = Jsoup.parse(html, "US-ASCII").html().getBytes();
    File dir = new File("results/xml");
    dir.mkdirs();
    FileOutputStream fos = new FileOutputStream(new File(dir, html.getName()));
    fos.write(xhtml);
    fos.close();
}

我还想告诉您，使用HTMLWorker不是一个好主意。该课程已被放弃，不再受到支持。它可能会在未来的一个版本中删除。您应该使用XML Worker。你可以在the itext site以及书籍The Best iText Questions on StackOverflow中找到XML Worker示例（这本书是免费的，所以你可能想要下载它。你的问题的答案被选为最好的问题）。

什么是干净的<span>免费html标签正则表达式或其他方式？</span>

2 个答案: