我想创建一个线程,以便抓取网站的所有链接并将其存储在LinkedHashSet
中,但是当我打印此LinkedHashSet
的大小时,它不会打印任何内容。我开始学习爬行了!我引用了Java的艺术。这是我的代码:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.LinkedHashSet;
import java.util.logging.Level;
import java.util.logging.Logger;
public class TestThread {
public void crawl(URL url) {
try {
BufferedReader reader = new BufferedReader(
new InputStreamReader(url.openConnection().getInputStream()));
String line = reader.readLine();
LinkedHashSet toCrawlList = new LinkedHashSet();
while (line != null) {
toCrawlList.add(line);
System.out.println(toCrawlList.size());
}
} catch (IOException ex) {
Logger.getLogger(TestThread.class.getName()).log(Level.SEVERE, null, ex);
}
}
public static void main(String[] args) {
final TestThread test1 = new TestThread();
Thread thread = new Thread(new Runnable() {
public void run(){
try {
test1.crawl(new URL("http://stackoverflow.com/"));
} catch (MalformedURLException ex) {
Logger.getLogger(TestThread.class.getName()).log(Level.SEVERE, null, ex);
}
}
});
}
}
答案 0 :(得分:0)
你应该像这样填写你的名单:
while ((line = reader.readLine()) != null) {
toCrawlList.add(line);
}
System.out.println(toCrawlList.size());
如果这不起作用,请尝试在代码中设置断点,并查明读者是否包含任何内容。