我正在编写http解析器并具有这些功能
int parse_useragent(char* buf, int length){
buf[length] = '\0';
if(strstr(buf, "MSIE") != NULL){
return 1;
}else if(strstr(buf, "Firefox") != NULL){
return 2;
}
return DEFAULT_USERAGENT;
}
void parse_headers(unsigned char* buf, http_record_t * http){
char * position = (char*)buf;
char referer[] = "Referer";
char useragent[] = "User-Agent";
...
int length = getlinelength(position); // returns length of line
while(length != 1){ // position points to start of line every iteration of cycle
if(strncmp(position, useragent, sizeof(useragent)-1) == 0){
http->useragent = parse_useragent(position, length);
fprintf(stderr,"parsing useragent \n");
}else if(strncmp(position, referer, sizeof(referer)-1) == 0){
fprintf(stderr,"parsing referer \n");
char * tmp = malloc(REFERER_LENGHT * sizeof(char));
parse_referer(tmp,position, length);
strncpy(http->referer,tmp, REFERER_LENGHT * sizeof(char) - 1);
}else if(...
position += length + 1;
length = getlinelength(position);
}
return;
}
buf
指向http标头的开头。
我的每个标题都有parse_useragent
的功能,我真的需要优化它们。
分组长度通常<1000,线路长度很少超过100。
这种短字符串的优化是否会产生明显的影响?
我知道其中一些算法需要不同的解析方法,然后逐行解析。在这些特定条件下选择哪种方式?
感谢您的帮助!
答案 0 :(得分:1)
如果你不介意将字符串硬编码到代码中,我认为lex将是执行此类任务的最快工具。因为它在源代码中明确地构建了一个有限状态自动机。
以下是执行此任务的示例lex代码:
%option noyywrap
%{
enum Type{
TFIREFOX = 0, TMSIE = 1
};
enum Type global_variable; /* the variable to store the parsing result */
%}
%%
FIREFOX {global_variable = TFIREFOX; yyterminate();}
MSIE {global_variable = TMSIE; yyterminate();}
. {}
%%
int lex_strstr(char *buf, int n)
{
global_variable = -1;
YY_BUFFER_STATE bs = yy_scan_buffer(buf, n);
yy_switch_to_buffer(bs);
yylex();
return global_variable;
}
将其存储在resulte.l
等文件中,然后使用flex编译它以获取c头文件:
flex -o head.h result.l
这是一个展示如何工作的例子:
#include "head.h"
int main()
{
{
char buf[] = "this is a test MSIE string\0\0";
printf("%d\n", lex_strstr(buf, (sizeof buf)));
}
{
char buf[] = "this is a test FIREFOX string\0\0";
printf("%d\n", lex_strstr(buf, (sizeof buf)));
}
{
char buf[] = "this is a test MSIEFIREFOX string\0\0";
printf("%d\n", lex_strstr(buf, (sizeof buf)));
}
{
char buf[] = "this is a test MIEFIEFOXdfa\0\0";
printf("%d\n", lex_strstr(buf, (sizeof buf)));
}
}
结果:
1
0
1
-1
答案 1 :(得分:0)