我正在使用java服务器来处理仅使用Socket
类的HTTP请求,因为我的教授说我们无法使用HTTP库(因为我们的目标是学习HTTP ...)。所以,我决定使用正则表达式处理请求。在代码上发生的第一件事是它获取请求的每一行并将其转换为一个我用模式处理的字符串。我只需要实现以下案例: GET , POST , PUT , HEAD , DELETE 。我正在使用应用 Postman 这一Google Chrome扩展程序来测试我的程序。以下是我将邮件变成单个字符串后来自邮递员的一些请求示例:
得到:
GET / HTTP / 1.1主机:127.0.0.1:15000连接:keep-alive Cache-Control:no-cache用户代理:Mozilla / 5.0(X11; Linux x86_64)AppleWebKit / 537.36(KHTML,类似Gecko)Chrome /53.0.2785.101 Safari / 537.36 Postman-Token:dd87e652-2b21-3632-30ad-ace26581d369接受: / 接受编码:gzip,deflate,sdch Accept-Language:en-US,en; q = 0.8
没有身体的帖子:
POST / HTTP / 1.1主机:127.0.0.1:15000连接:keep-alive内容长度:0缓存控制:无缓存原产地:chrome-extension:// fhbjgbiflinjbdggehcddcbncdddomop用户代理:Mozilla / 5.0(X11 ; Linux x86_64)AppleWebKit / 537.36(KHTML,类似Gecko)Chrome / 53.0.2785.101 Safari / 537.36 Postman-Token:8094b5ce-4b3d-cee7-2d10-f5dd2bc6b7b2接受: / Accept-Encoding:gzip, deflate Accept-Language:en-US,en; q = 0.8
张贴身体:
POST / HTTP / 1.1主机:127.0.0.1:15000连接:keep-alive内容长度:9邮递员令牌:3fb2f5e0-2df1-5af4-7853-e9de84648dd5缓存控制:无缓存原点:chrome-extension :// fhbjgbiflinjbdggehcddcbncdddomop用户代理:Mozilla / 5.0(X11; Linux x86_64)AppleWebKit / 537.36(KHTML,类似Gecko)Chrome / 53.0.2785.101 Safari / 537.36内容类型:text / plain; charset = UTF-8接受:< em> / Accept-Encoding:gzip,deflate Accept-Language:en-US,en; q = 0.8
等等...
我写的模式是:
String somethingPattern = "(.*)?";
String ipPattern = "(((2[0-4][0-9])|(25[0-5])|(1?[0-9]?[0-9]))\\.((2[0-4][0-9])|(25[0-5])|(1?[0-9]?[0-9]))\\.((2[0-4][0-9])|(25[0-5])|(1?[0-9]?[0-9]))\\.((2[0-4][0-9])|(25[0-5])|(1?[0-9]?[0-9]))|"+somethingPattern+")((:)\\d{3,})?"; // regex for ip varying from 0.0.0.0 to 255.255.255.255 or some string, followed or no by : and a port number
String objetoPattern = "([/?a-zA-Z0-9\\.\\-_]+)"; // regex for a linux path to a file, including only letters, numbers and -_.
String connectionPattern = "(connection:\\s*"+somethingPattern+")?";
String contentLenPattern = "(content-length:\\s*([0-9]+))?";
String postmanTokenPattern = "(postman-token:\\s*"+somethingPattern+")?";
String cacheControlPattern = "(cache-control:\\s*"+somethingPattern+")?";
String originPattern = "(origin:\\s*"+somethingPattern+")?";
String userAgentPattern = "(user-agent:\\s*"+somethingPattern+")?";
String charsetPattern = "(charset="+somethingPattern+")?";
String contentTypePattern = "(content-type:\\s*"+somethingPattern+";"+charsetPattern+")?";
String acceptPattern = "(accept:\\s*"+somethingPattern+")?";
String acceptEncodingPattern = "(accept-encoding:\\s*"+somethingPattern+")?";
String acceptLanguagePattern = "(accept-language:\\s*"+somethingPattern+")?";
// (?i) is for the case of coming get, Get, GET... etc...
String pattern = "^(?i)(get|put|head|post|delete)\\s+?" + objetoPattern + "\\s+?HTTP/1.1\\s+?host:\\s+?" + ipPattern + "\\s+?" + connectionPattern + "\\s+?" + contentLenPattern + "\\s+?" + postmanTokenPattern + "\\s+?" + cacheControlPattern + "\\s+?" + originPattern + "\\s+?" + userAgentPattern + "\\s+?" + contentTypePattern + "\\s+?" + acceptPattern + "\\s+?" + acceptEncodingPattern + "\\s+?" + acceptLanguagePattern + "\\s+?$";
正则表达式匹配和分组很好,大部分请求除了, GET , HEAD 和 POST没有正文即可。我不知道为什么会这样。我在每个模式的末尾添加?
,例如,origin
,content-length
或类似请求中不存在的情况。但即使它不符合这些情况。匹配代码的一部分是:
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(in); // this in is the input string that is the request all joined in a single line string
if(m.find()){
// ......
} else {
System.out.println("Input didn't match");
}
编辑:处理来自Socket的输入的代码部分:
bufferedReader = new BufferedReader(new InputStreamReader(socket.getInputStream()));
String in = "";
while((msgDoSocket = bufferedReader.readLine()) != null){
try {
in += msgDoSocket + " ";
if(msgDoSocket.isEmpty()){
processaInput(in); // this calls the part that process regex
}
} catch (Exception ex) {
Logger.getLogger(ServerThread.class.getName()).log(Level.SEVERE, null, ex);
}
}