我有网页列表(超过100个),我必须从中获取并收集数据。 我决定将html从所有这些文件保存到一个文件中,然后使用Jsoup查找有趣的数据。
但问题是我不知道如何运行100个线程,并将响应保存到一个文件中,任何想法?
答案 0 :(得分:0)
也许它不是杰作,但它有效,我希望尽可能简单。
ArrayList<String> links = new ArrayList<>();
Elements myDiv;
private void saveDetails() throws IOException {
if(repeat < links.size()){
repeat++;
textView.setText(String.valueOf(repeat));
saveFile(myDiv.toString());
myDiv = null;
getDetails(links.get(repeat));
}else {
textView.setText("finished");
}
}
private void getDetails(String urlStr) {
final String detailsUrl = urlStr;
new Thread() {
@Override
public void run() {
Message msg = Message.obtain();
try {
Document doc = Jsoup.connect(detailsUrl).get();
myDiv = doc.select(".exhibitor-contact");
} catch (IOException e1) {
e1.printStackTrace();
}
detailsHandler.sendMessage(msg);
}
}.start();
}
private Handler detailsHandler = new Handler() {
public void handleMessage(Message msg) {
super.handleMessage(msg);
try {
saveDetails();
} catch (IOException e) {
e.printStackTrace();
}
}
};
答案 1 :(得分:0)
您无需将所有文件保存在文件中,然后对其进行处理。您可以逐个收集信息。这是我的建议:
arrayList urls = {100 site-url}; //in correct syntax
Document doc = null;
for (String url : urls) {
doc = Jsoup.connect(url).get();
//now proccess doc.toString as you want(in regular expression for example)
//save your desired information
}