我想用Jsoup解析Html页面。
html.page
<html>
<head></head>
<body>
<div id="1">SomeText</div>
<script>(function(a, b)){var fjs = a.getElementsByTagNames(b)[0]; … }
</script>
<div class="class1">SomeText</div>
<div class="class2">SomeText</div>
<script>(function(c, d)){var fjs = c.getElementsByTagNames(d)[0]; … }
</script>
<div class="class3">SomeText</div>
<div class="class4">SomeText</div>
</body>
</html>
要检索一些信息,我写了代码:
File input = new File(filePath);
PrintWriter writer = new PrintWriter(input, "UTF-8");
writer.write(document.getElementById("1").outerHtml() + "\n");
writer.write(document.getElementsByClass("class1").outerHtml() + "\n");
writer.write(document.getElementsByClass("class2").outerHtml() + "\n");
writer.flush();
writer.close();
文件的输出内容为:
<div id="1">SomeText</div>
<div class="class1">SomeText</div>
<div class="class2">SomeText</div>
接收输出文件内容的最佳方法是什么?
<div id="1">SomeText</div>
<script>(function(a, b)){var fjs = a.getElementsByTagNames(b)[0]; … }
</script>
<div class="class1">SomeText</div>
<div class="class2">SomeText</div>
答案 0 :(得分:1)
尝试使用getElementByTag并在文件中将结果写入您想要的地方
答案 1 :(得分:0)
File input = new File(filePath);
PrintWriter writer = new PrintWriter(input, "UTF-8");
writer.write(document.getElementById("1").outerHtml() + "\n");
Elements scripts = document.getElementsByTag("script");
for (Element script : scripts) {
if (script.data().startsWith("(function(a, b)")) {
writer.write(script.outerHtml() + "\n");
}
}
writer.write(document.getElementsByClass("class1").outerHtml() + "\n");
writer.write(document.getElementsByClass("class2").outerHtml() + "\n");
writer.flush();
writer.close();