如何使用Jsoup从html页面检索代码片段?

时间:2015-04-23 10:30:42

标签: java jsp html-parsing jsoup

我想用Jsoup解析Html页面。

  

html.page

<html>
<head></head>
<body>
   <div id="1">SomeText</div>
     <script>(function(a, b)){var fjs = a.getElementsByTagNames(b)[0]; … }      
     </script>
   <div class="class1">SomeText</div>
   <div class="class2">SomeText</div>
     <script>(function(c, d)){var fjs = c.getElementsByTagNames(d)[0]; … }     
     </script>
   <div class="class3">SomeText</div>
   <div class="class4">SomeText</div>
</body>
</html>

要检索一些信息,我写了代码:

File input = new File(filePath);
PrintWriter writer = new PrintWriter(input, "UTF-8");
writer.write(document.getElementById("1").outerHtml() + "\n");
writer.write(document.getElementsByClass("class1").outerHtml() + "\n");
writer.write(document.getElementsByClass("class2").outerHtml() + "\n");
writer.flush();
writer.close();

文件的输出内容为:

<div id="1">SomeText</div>
<div class="class1">SomeText</div>
<div class="class2">SomeText</div>
  

接收输出文件内容的最佳方法是什么?

<div id="1">SomeText</div>
<script>(function(a, b)){var fjs = a.getElementsByTagNames(b)[0]; … }     
</script>
<div class="class1">SomeText</div>
<div class="class2">SomeText</div>

2 个答案:

答案 0 :(得分:1)

尝试使用getElementByTag并在文件中将结果写入您想要的地方

答案 1 :(得分:0)

File input = new File(filePath);
PrintWriter writer = new PrintWriter(input, "UTF-8");
 writer.write(document.getElementById("1").outerHtml() + "\n");
 Elements scripts = document.getElementsByTag("script");
  for (Element script : scripts) {
   if (script.data().startsWith("(function(a, b)")) {
   writer.write(script.outerHtml() + "\n");
    }
  }
 writer.write(document.getElementsByClass("class1").outerHtml() + "\n");
 writer.write(document.getElementsByClass("class2").outerHtml() + "\n");
  writer.flush();
  writer.close();