我是jsoup的新手,我需要使用它,但这是一个问题。只抓取了有限数量的链接。我爬行http://shais.net/,我只看到35个abs url,而它至少有430个链接。这是我的代码:
public static void main(String[] args) throws SQLException, IOException {
PreparedStatement statement = db.Connection.connection.prepareStatement("truncate record;");
statement.execute();
processPage("http://shais.net/");//TODO
}
public static void processPage(String URL) throws SQLException, IOException {
String sql = "select * from Record where URL = '"+URL+"'";
PreparedStatement select = db.Connection.connection.prepareStatement(sql);
ResultSet result = select.executeQuery();
if(result.next()){
}else{
sql = "insert into record"+" (URL) values"+"('"+URL+"')";
PreparedStatement statement = db.Connection.connection.prepareStatement(sql,Statement.RETURN_GENERATED_KEYS);
statement.execute();
org.jsoup.nodes.Document doc =Jsoup.connect("http://shais.net/").header("Accept-Encoding", "gzip, deflate")//TODO
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0")
.maxBodySize(0)
.timeout(600000).get();
if(doc.text().contains("research")){
System.out.println(URL);
}
Elements questions = doc.select("a[href]");
for(Element link:questions){
if(link.attr("href").contains("shais.net"))
processPage(link.attr("abs:href"));
System.out.println(link.attr("abs:href"));
}
}
}
请帮我解决问题所在。
感谢。