我正在编写一个JAVA中继代理服务,它充当浏览器和互联网之间的中间件。其目的是仅查看从浏览器传递Web请求和对浏览器的响应,并在以后离线解析这些响应。
我的JAVA代理在特定套接字上侦听来自浏览器的连接。当出现新连接时,它会读取浏览器请求标头,标识要连接的主机,创建与主机的连接并传递浏览器请求。解析浏览器请求和中继服务器响应的代码是下面给出的streamHTTPData()方法。在代码中,debugOut是标准的System.out。
该代码适用于大部分网站,但一些奇怪的问题出现在一些网站上,我无法查看主页。当我随机关注谷歌搜索链接时,我注意到了这种情况,并且遇到了一个论坛。我在Firefox浏览器中使用了HTTPFOX扩展,并注意到浏览器发送到JAVA程序并从那里发送到Web服务器的请求完全相同。但是,我在不使用JAVA中间件时收到HTTP 200响应,否则收到HTTP 404。我不确定问题是什么。任何人都可以指出我正确的方向。 HTTPFOX捕获的HTTP请求和响应如下所示。
private int streamHTTPData(InputStream in, OutputStream out,StringBuffer host, StringBuffer url, boolean waitForDisconnect) {
// get the HTTP data from an InputStream, and send it to
// the designated OutputStream
StringBuffer header = new StringBuffer("");
String data = "";
int responseCode = 200;
int contentLength = 0;
int pos = -1;
int byteCount = 0;
try {
// get the first line of the header, so we know the response code
data = readLine(in);
if (data != null) {
header.append(data + "\r\n");
pos = data.indexOf(" ");
if ((data.toLowerCase().startsWith("http")) && (pos >= 0)
&& (data.indexOf(" ", pos + 1) >= 0)) {
String rcString = data.substring(pos + 1,
data.indexOf(" ", pos + 1));
try {
responseCode = Integer.parseInt(rcString);
} catch (Exception e) {
if (debugLevel > 0)
debugOut.println("Error parsing response code "
+ rcString);
}
} else {
if ((pos >= 0) && (data.indexOf(" ", pos + 1) >= 0)) {
String suffix = data.substring(pos + 1,
data.indexOf(" ", pos + 1));
url.setLength(0);
url.append(suffix.trim());
}
}
}
// get the rest of the header info
while ((data = readLine(in)) != null) {
// the header ends at the first blank line
if (data.length() == 0)
break;
header.append(data + "\r\n");
// check for the Host header
pos = data.toLowerCase().indexOf("host:");
if (pos >= 0) {
host.setLength(0);
host.append(data.substring(pos + 5).trim());
}
// check for the Content-Length header
pos = data.toLowerCase().indexOf("content-length:");
if (pos >= 0)
contentLength = Integer.parseInt(data.substring(pos + 15)
.trim());
}
// add a blank line to terminate the header info
header.append("\r\n");
// convert the header to a byte array, and write it to our stream
out.write(header.toString().getBytes(), 0, header.length());
System.out.println(header.toString());
// if the header indicated that this was not a 200 response,
// just return what we've got if there is no Content-Length,
// because we may not be getting anything else
if ((responseCode != 200) && (contentLength == 0)) {
out.flush();
return header.length();
}
// get the body, if any; we try to use the Content-Length header to
// determine how much data we're supposed to be getting, because
// sometimes the client/server won't disconnect after sending us
// information...
if (contentLength > 0)
waitForDisconnect = false;
if ((contentLength > 0) || (waitForDisconnect)) {
try {
byte[] buf = new byte[4096];
int bytesIn = 0;
while (((byteCount < contentLength) || (waitForDisconnect))
&& ((bytesIn = in.read(buf)) >= 0)) {
out.write(buf, 0, bytesIn);
out.flush();
byteCount += bytesIn;
}
} catch (Exception e) {
String errMsg = "Error getting HTTP body: " + e;
if (debugLevel > 0)
debugOut.println(errMsg);
}
}
} catch (Exception e) {
if (debugLevel > 0)
debugOut.println("Error getting HTTP data: " + e);
}
// flush the OutputStream and return
try {
out.flush();
} catch (Exception e) {
}
return (header.length() + byteCount);
}
HTTP请求(有和没有中间件):
(Request-Line) GET / HTTP/1.1
Host andhrawatch.com
User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20100101 Firefox/13.0.1
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip, deflate
Proxy-Connection keep-alive
没有JAVA中间件的HTTP响应:
(Status-Line) HTTP/1.1 200 OK
Date Fri, 27 Jul 2012 03:51:38 GMT
Server Apache/2.2.14 (Unix) mod_ssl/2.2.14 OpenSSL/0.9.8e-fips-rhel5 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
X-Powered-By PHP/5.3.1
P3P CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
Expires Mon, 1 Jan 2001 00:00:00 GMT
Cache-Control post-check=0, pre-check=0
Pragma no-cache
Set-Cookie 0f486952816b6d6bf53a4c34b724b278=c68edaebc6dedb2b291832dfbfb784fc; path=/
Last-Modified Fri, 27 Jul 2012 03:51:38 GMT
Keep-Alive timeout=5, max=100
Connection Keep-Alive
Transfer-Encoding chunked
Content-Type text/html; charset=utf-8
使用JAVA中间件的HTTP响应
(Status-Line) HTTP/1.1 404 Component not found
Date Fri, 27 Jul 2012 03:54:39 GMT
Server Apache/2.2.14 (Unix) mod_ssl/2.2.14 OpenSSL/0.9.8e-fips-rhel5 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
X-Powered-By PHP/5.3.1
P3P CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
Expires Mon, 1 Jan 2001 00:00:00 GMT
Cache-Control post-check=0, pre-check=0
Pragma no-cache
Set-Cookie 0f486952816b6d6bf53a4c34b724b278=33806d89181aa6d488ccba1b9163e511; path=/
Last-Modified Fri, 27 Jul 2012 03:54:39 GMT
Transfer-Encoding chunked
Content-Type text/html; charset=utf-8