我试图以每次打开具有不同ID的网站的方式阅读网站的页面源。 我设法阅读5-6页,但之后我阅读了服务通知页面:“请激活浏览器cookie以查看此网站” 我知道我需要以某种方式管理cookie,但我尝试的任何方式都不起作用。
这是我的代码:
public void read_and_save_pages() {
for (String id : ids) {
try {
// open url
URL url = new URL(link + id);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
// set user agent
connection
.setRequestProperty(
"User-Agent",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.76 Safari/537.36");
// read page source code
BufferedReader in = new BufferedReader(new InputStreamReader(
connection.getInputStream(), "windows-1255"));
// create file to write
FileWriter fstream = new FileWriter(
path + ".html");
BufferedWriter out = new BufferedWriter(fstream);
// write file
String line = in.readLine();
while (line != null) {
out.write(line + '\n');
line = in.readLine();
}
out.close();
} catch (Exception e) {// Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
}