删除链接jsoup中的脚本

时间:2015-10-12 01:58:50

标签: java jsoup

我想在阅读网址而不是文件时删除脚本,请帮帮我

  Document connect =  Jsoup.connect("http://www.tutorialspoint.com/ant/ant_deploying_applications.htm");
            Elements selects = connect.select("div.middle-col");
            System.out.println(selects.removeAttr("script").html());

2 个答案:

答案 0 :(得分:5)

这是您需要删除脚本元素的方法:

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class TestJsoup {
    public static void main(String args[]) throws IOException {
        Document doc = Jsoup.connect("http://www.tutorialspoint.com/ant/ant_deploying_applications.htm").get();

        Elements selects = doc.select("div.middle-col");
        for (Element script : selects) {
            Elements scripts = script.select("script");
            scripts.remove();
        }   
        System.out.println(selects.html());
    }
}

答案 1 :(得分:3)

此外,您可以使用Jsoup.Clean(html,white)