我有实验室工作使用BSD套接字api制作爬虫,所以我需要发出多个http请求来提取所有响应,我试图用单个套接字连接来做,但我只能在发送后才得到响应请求标头,其他响应为空。 这是我的代码,那么解决方案是什么?:
Socket socket = new Socket("fucking-great-advice.ru", 80);
BufferedReader input = new BufferedReader(new InputStreamReader(socket.getInputStream()));
PrintWriter output = new PrintWriter(socket.getOutputStream());
for (int numberAdvice = 1; numberAdvice < 100; numberAdvice++) {
output.write("GET /advice/" + numberAdvice + " HTTP/1.0\r\n\r\n");
output.flush();
StringBuilder sb = new StringBuilder();
int ch = 0;
while ((ch = input.read()) != -1) {
sb.append((char) ch);
}
String response = sb.toString().split("\r\n\r\n")[1];
System.out.println(response);
}
input.close();
output.close();
socket.close();
答案 0 :(得分:1)
在您当前的代码中存在许多问题:
host
,以免收到错误404
。InputStream
,直到你得到-1
这意味着你隐含地期望到达流的末尾(流关闭)这不是你想要的,因为你试图继续查询服务器Connection: keep-alive
以指示服务器在收到回复后避免关闭连接请求是:
output.write(
String.format(
"GET /advice/%d HTTP/1.1\r\nHost: fucking-great-advice.ru\r\nConnection: keep-alive\r\n\r\n",
numberAdvice
)
);
output.flush();
以下是阅读和显示回复的方法:
if (numberAdvice > 1) {
// Skip inter responses empty line
input.readLine();
}
StringBuilder sb = new StringBuilder();
String line;
boolean started = false;
while ((line = input.readLine()) != null) {
if (!started) {
// Here we check if we reached the end of the header
if (line.isEmpty()) {
// Here the body starts
started = true;
// Skip chunk start
input.readLine();
}
continue;
}
if ("0".equals(line)) {
// Reached chunk end
break;
}
sb.append(line);
}
System.out.println(sb);
NB:此代码不是最佳或完美的,它只显示全球性的想法