我正在研究HTTP流量数据集,它由完整的POST和GET请求组成,如下所示。我在java中编写了代码,将每个请求分开并将其保存为数组列表中的字符串元素。 现在我很困惑如何在java中解析这些原始HTTP请求是否有比手动解析更好的方法?
GET http://localhost:8080/tienda1/imagenes/3.gif/ HTTP/1.1
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.8 (like Gecko)
Pragma: no-cache
Cache-control: no-cache
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Encoding: x-gzip, x-deflate, gzip, deflate
Accept-Charset: utf-8, utf-8;q=0.5, *;q=0.5
Accept-Language: en
Host: localhost:8080
Cookie: JSESSIONID=FB018FFB06011CFABD60D8E8AD58CA21
Connection: close
答案 0 :(得分:13)
这是一个普通的Http请求解析器,适用于所有方法类型(GET,POST等),以方便您使用:
package util.dpi.capture;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import java.util.Hashtable;
/**
* Class for HTTP request parsing as defined by RFC 2612:
*
* Request = Request-Line ; Section 5.1 (( general-header ; Section 4.5 |
* request-header ; Section 5.3 | entity-header ) CRLF) ; Section 7.1 CRLF [
* message-body ] ; Section 4.3
*
* @author izelaya
*
*/
public class HttpRequestParser {
private String _requestLine;
private Hashtable<String, String> _requestHeaders;
private StringBuffer _messagetBody;
public HttpRequestParser() {
_requestHeaders = new Hashtable<String, String>();
_messagetBody = new StringBuffer();
}
/**
* Parse and HTTP request.
*
* @param request
* String holding http request.
* @throws IOException
* If an I/O error occurs reading the input stream.
* @throws HttpFormatException
* If HTTP Request is malformed
*/
public void parseRequest(String request) throws IOException, HttpFormatException {
BufferedReader reader = new BufferedReader(new StringReader(request));
setRequestLine(reader.readLine()); // Request-Line ; Section 5.1
String header = reader.readLine();
while (header.length() > 0) {
appendHeaderParameter(header);
header = reader.readLine();
}
String bodyLine = reader.readLine();
while (bodyLine != null) {
appendMessageBody(bodyLine);
bodyLine = reader.readLine();
}
}
/**
*
* 5.1 Request-Line The Request-Line begins with a method token, followed by
* the Request-URI and the protocol version, and ending with CRLF. The
* elements are separated by SP characters. No CR or LF is allowed except in
* the final CRLF sequence.
*
* @return String with Request-Line
*/
public String getRequestLine() {
return _requestLine;
}
private void setRequestLine(String requestLine) throws HttpFormatException {
if (requestLine == null || requestLine.length() == 0) {
throw new HttpFormatException("Invalid Request-Line: " + requestLine);
}
_requestLine = requestLine;
}
private void appendHeaderParameter(String header) throws HttpFormatException {
int idx = header.indexOf(":");
if (idx == -1) {
throw new HttpFormatException("Invalid Header Parameter: " + header);
}
_requestHeaders.put(header.substring(0, idx), header.substring(idx + 1, header.length()));
}
/**
* The message-body (if any) of an HTTP message is used to carry the
* entity-body associated with the request or response. The message-body
* differs from the entity-body only when a transfer-coding has been
* applied, as indicated by the Transfer-Encoding header field (section
* 14.41).
* @return String with message-body
*/
public String getMessageBody() {
return _messagetBody.toString();
}
private void appendMessageBody(String bodyLine) {
_messagetBody.append(bodyLine).append("\r\n");
}
/**
* For list of available headers refer to sections: 4.5, 5.3, 7.1 of RFC 2616
* @param headerName Name of header
* @return String with the value of the header or null if not found.
*/
public String getHeaderParam(String headerName){
return _requestHeaders.get(headerName);
}
}
答案 1 :(得分:5)
我正在[一] HTTP流量数据集上工作,该流量数据集由完整的POST和GET请求组成[s]
因此,您要解析包含多个HTTP请求的文件或列表。您想要提取哪些数据?无论如何here是一个Java HTTP解析类,它可以读取请求行中使用的方法,版本和URI,并将所有头读入Hashtable。
如果您想重新发明轮子,可以使用那个或自己写一个。看看RFC,看看请求是什么样的,以便正确解析它:
Request = Request-Line ; Section 5.1
*(( general-header ; Section 4.5
| request-header ; Section 5.3
| entity-header ) CRLF) ; Section 7.1
CRLF
[ message-body ] ; Section 4.3
答案 2 :(得分:2)
如果您只想原始发送原始请求,那么非常简单,只需使用TCP套接字发送实际的字符串即可!
这样的事情:
Socket socket = new Socket(host, port);
BufferedWriter out = new BufferedWriter(
new OutputStreamWriter(socket.getOutputStream(), "UTF8"));
for (String line : getContents(request)) {
System.out.println(line);
out.write(line + "\r\n");
}
out.write("\r\n");
out.flush();
请参阅JoeJag的blog post获取完整代码。
<强>更新强>
我启动了一个项目,RawHTTP为请求,响应,标题等提供HTTP解析器......结果非常好,以至于可以很容易地在其上编写HTTP服务器和客户端。如果你正在寻找低水平的东西,请查看它。