我在使用c ++解析一些HTTP头时遇到问题。现在,我希望能够找到结束每个HTTP标头条目的回车/换行组合。我正在使用str.find()这样做:
string hdr; //filled with the header data
int line_end_pos = hdr.find("\r\n"); //also tried "\\r\\n", same results
尽管知道标题具有回车符和换行符的组合,但find()仍然返回-1。我在这里缺少什么?
修改
我正在使用的库提供了几种不同的功能来显示数据。标题数据的示例在字符串格式中如下所示:
GET /p/libcrafter/ HTTP/1.1
Host: code.google.com
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en,en-us;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Cookie: PREF=ID=ad8fd3ab4b0bd3c9:U=e1bd88556eeb2dce:FF=0:TM=1382531357:LM=1382531841:S=Pbh-JiokGeVbsSh-; NID=67=olK2k5sUZ95mRApV77s7CfXscytJSfmVuyubiSCMotOdBBvijqrTwyyifLQZbZA_SCTVQXqTEoE6hqaqVJkRpqoY2RPDFBPghbe5czX6QxKw7lBdOaP6-IpzGXYMWl6Q; OGPC=4061029-5:; __utma=247248150.2068354019.1382532826.1382532826.1382532826.1; __utmb=247248150.10.10.1382532826; __utmc=247248150; __utmz=247248150.1382532826.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
Connection: keep-alive
Cache-Control: max-age=0
它以“Hex Dump”格式显示:
47455420 2F702F6C 69626372 61667465 GET /p/libcrafte 00000000
722F2048 5454502F 312E310D 0A486F73 r/ HTTP/1.1..Hos 00000010
743A2063 6F64652E 676F6F67 6C652E63 t: code.google.c 00000020
6F6D0D0A 55736572 2D416765 6E743A20 om..User-Agent: 00000030
4D6F7A69 6C6C612F 352E3020 28583131 Mozilla/5.0 (X11 00000040
3B205562 756E7475 3B204C69 6E757820 ; Ubuntu; Linux 00000050
7838365F 36343B20 72763A32 342E3029 x86_64; rv:24.0) 00000060
20476563 6B6F2F32 30313030 31303120 Gecko/20100101 00000070
46697265 666F782F 32342E30 0D0A4163 Firefox/24.0..Ac 00000080
63657074 3A207465 78742F68 746D6C2C cept: text/html, 00000090
6170706C 69636174 696F6E2F 7868746D application/xhtm 000000A0
6C2B786D 6C2C6170 706C6963 6174696F l+xml,applicatio 000000B0
6E2F786D 6C3B713D 302E392C 2A2F2A3B n/xml;q=0.9,*/*; 000000C0
713D302E 380D0A41 63636570 742D4C61 q=0.8..Accept-La 000000D0
6E677561 67653A20 656E2C65 6E2D7573 nguage: en,en-us 000000E0
3B713D30 2E350D0A 41636365 70742D45 ;q=0.5..Accept-E 000000F0
6E636F64 696E673A 20677A69 702C2064 ncoding: gzip, d 00000100
65666C61 74650D0A 444E543A 20310D0A eflate..DNT: 1.. 00000110
436F6F6B 69653A20 50524546 3D49443D Cookie: PREF=ID= 00000120
61643866 64336162 34623062 64336339 ad8fd3ab4b0bd3c9 00000130
3A553D65 31626438 38353536 65656232 :U=e1bd88556eeb2 00000140
6463653A 46463D30 3A544D3D 31333832 dce:FF=0:TM=1382 00000150
35333133 35373A4C 4D3D3133 38323533 531357:LM=138253 00000160
31383431 3A533D50 62682D4A 696F6B47 1841:S=Pbh-JiokG 00000170
65566273 53682D3B 204E4944 3D36373D eVbsSh-; NID=67= 00000180
6F6C4B32 6B357355 5A39356D 52417056 olK2k5sUZ95mRApV 00000190
37377337 43665873 6379744A 53666D56 77s7CfXscytJSfmV 000001A0
75797562 6953434D 6F744F64 42427669 uyubiSCMotOdBBvi 000001B0
6A717254 77797969 664C515A 625A415F jqrTwyyifLQZbZA_ 000001C0
53435456 51587154 456F4536 68716171 SCTVQXqTEoE6hqaq 000001D0
564A6B52 70716F59 32525044 46425067 VJkRpqoY2RPDFBPg 000001E0
68626535 637A5836 51784B77 376C4264 hbe5czX6QxKw7lBd 000001F0
4F615036 2D49707A 4758594D 576C3651 OaP6-IpzGXYMWl6Q 00000200
3B204F47 50433D34 30363130 32392D35 ; OGPC=4061029-5 00000210
3A3B205F 5F75746D 613D3234 37323438 :; __utma=247248 00000220
3135302E 32303638 33353430 31392E31 150.2068354019.1 00000230
33383235 33323832 362E3133 38323533 382532826.138253 00000240
32383236 2E313338 32353332 3832362E 2826.1382532826. 00000250
313B205F 5F75746D 623D3234 37323438 1; __utmb=247248 00000260
3135302E 31302E31 302E3133 38323533 150.10.10.138253 00000270
32383236 3B205F5F 75746D63 3D323437 2826; __utmc=247 00000280
32343831 35303B20 5F5F7574 6D7A3D32 248150; __utmz=2 00000290
34373234 38313530 2E313338 32353332 47248150.1382532 000002A0
3832362E 312E312E 75746D63 73723D28 826.1.1.utmcsr=( 000002B0
64697265 6374297C 75746D63 636E3D28 direct)|utmccn=( 000002C0
64697265 6374297C 75746D63 6D643D28 direct)|utmcmd=( 000002D0
6E6F6E65 290D0A43 6F6E6E65 6374696F none)..Connectio 000002E0
6E3A206B 6565702D 616C6976 650D0A43 n: keep-alive..C 000002F0
61636865 2D436F6E 74726F6C 3A206D61 ache-Control: ma 00000300
782D6167 653D300D 0A0D0A x-age=0.... 00000310
最后,它看起来像是一个“Raw String”:
\x47\x45\x54\x20\x2f\x70\x2f\x6c\x69\x62\x63\x72\x61\x66\x74\x65\x72\x2f\x20\x48
\x54\x54\x50\x2f\x31\x2e\x31\xd\xa\x48\x6f\x73\x74\x3a\x20\x63\x6f\x64\x65\x2e\x67
\x6f\x6f\x67\x6c\x65\x2e\x63\x6f\x6d\xd\xa\x55\x73\x65\x72\x2d\x41\x67\x65\x6e\x74
\x3a\x20\x4d\x6f\x7a\x69\x6c\x6c\x61\x2f\x35\x2e\x30\x20\x28\x58\x31\x31\x3b\x20\x55
\x62\x75\x6e\x74\x75\x3b\x20\x4c\x69\x6e\x75\x78\x20\x78\x38\x36\x5f\x36\x34\x3b\x20
\x72\x76\x3a\x32\x34\x2e\x30\x29\x20\x47\x65\x63\x6b\x6f\x2f\x32\x30\x31\x30\x30\x31
\x30\x31\x20\x46\x69\x72\x65\x66\x6f\x78\x2f\x32\x34\x2e\x30\xd\xa\x41\x63\x63\x65\x70
\x74\x3a\x20\x74\x65\x78\x74\x2f\x68\x74\x6d\x6c\x2c\x61\x70\x70\x6c\x69\x63\x61\x74
\x69\x6f\x6e\x2f\x78\x68\x74\x6d\x6c\x2b\x78\x6d\x6c\x2c\x61\x70\x70\x6c\x69\x63\x61
\x74\x69\x6f\x6e\x2f\x78\x6d\x6c\x3b\x71\x3d\x30\x2e\x39\x2c\x2a\x2f\x2a\x3b\x71\x3d
\x30\x2e\x38\xd\xa\x41\x63\x63\x65\x70\x74\x2d\x4c\x61\x6e\x67\x75\x61\x67\x65\x3a\x20
\x65\x6e\x2c\x65\x6e\x2d\x75\x73\x3b\x71\x3d\x30\x2e\x35\xd\xa\x41\x63\x63\x65\x70\x74
\x2d\x45\x6e\x63\x6f\x64\x69\x6e\x67\x3a\x20\x67\x7a\x69\x70\x2c\x20\x64\x65\x66\x6c\x61
\x74\x65\xd\xa\x44\x4e\x54\x3a\x20\x31\xd\xa\x43\x6f\x6f\x6b\x69\x65\x3a\x20\x50\x52
\x45\x46\x3d\x49\x44\x3d\x61\x64\x38\x66\x64\x33\x61\x62\x34\x62\x30\x62\x64\x33\x63
\x39\x3a\x55\x3d\x65\x31\x62\x64\x38\x38\x35\x35\x36\x65\x65\x62\x32\x64\x63\x65\x3a
\x46\x46\x3d\x30\x3a\x54\x4d\x3d\x31\x33\x38\x32\x35\x33\x31\x33\x35\x37\x3a\x4c\x4d
\x3d\x31\x33\x38\x32\x35\x33\x31\x38\x34\x31\x3a\x53\x3d\x50\x62\x68\x2d\x4a\x69\x6f
\x6b\x47\x65\x56\x62\x73\x53\x68\x2d\x3b\x20\x4e\x49\x44\x3d\x36\x37\x3d\x6f\x6c\x4b
\x32\x6b\x35\x73\x55\x5a\x39\x35\x6d\x52\x41\x70\x56\x37\x37\x73\x37\x43\x66\x58\x73
\x63\x79\x74\x4a\x53\x66\x6d\x56\x75\x79\x75\x62\x69\x53\x43\x4d\x6f\x74\x4f\x64\x42
\x42\x76\x69\x6a\x71\x72\x54\x77\x79\x79\x69\x66\x4c\x51\x5a\x62\x5a\x41\x5f\x53\x43
\x54\x56\x51\x58\x71\x54\x45\x6f\x45\x36\x68\x71\x61\x71\x56\x4a\x6b\x52\x70\x71\x6f
\x59\x32\x52\x50\x44\x46\x42\x50\x67\x68\x62\x65\x35\x63\x7a\x58\x36\x51\x78\x4b\x77
\x37\x6c\x42\x64\x4f\x61\x50\x36\x2d\x49\x70\x7a\x47\x58\x59\x4d\x57\x6c\x36\x51\x3b
\x20\x4f\x47\x50\x43\x3d\x34\x30\x36\x31\x30\x32\x39\x2d\x35\x3a\x3b\x20\x5f\x5f\x75
\x74\x6d\x61\x3d\x32\x34\x37\x32\x34\x38\x31\x35\x30\x2e\x32\x30\x36\x38\x33\x35\x34
\x30\x31\x39\x2e\x31\x33\x38\x32\x35\x33\x32\x38\x32\x36\x2e\x31\x33\x38\x32\x35\x33
\x32\x38\x32\x36\x2e\x31\x33\x38\x32\x35\x33\x32\x38\x32\x36\x2e\x31\x3b\x20\x5f\x5f
\x75\x74\x6d\x62\x3d\x32\x34\x37\x32\x34\x38\x31\x35\x30\x2e\x31\x30\x2e\x31\x30\x2e
\x31\x33\x38\x32\x35\x33\x32\x38\x32\x36\x3b\x20\x5f\x5f\x75\x74\x6d\x63\x3d\x32\x34
\x37\x32\x34\x38\x31\x35\x30\x3b\x20\x5f\x5f\x75\x74\x6d\x7a\x3d\x32\x34\x37\x32\x34
\x38\x31\x35\x30\x2e\x31\x33\x38\x32\x35\x33\x32\x38\x32\x36\x2e\x31\x2e\x31\x2e\x75
\x74\x6d\x63\x73\x72\x3d\x28\x64\x69\x72\x65\x63\x74\x29\x7c\x75\x74\x6d\x63\x63\x6e
\x3d\x28\x64\x69\x72\x65\x63\x74\x29\x7c\x75\x74\x6d\x63\x6d\x64\x3d\x28\x6e\x6f\x6e
\x65\x29\xd\xa\x43\x6f\x6e\x6e\x65\x63\x74\x69\x6f\x6e\x3a\x20\x6b\x65\x65\x70\x2d\x61
\x6c\x69\x76\x65\xd\xa\x43\x61\x63\x68\x65\x2d\x43\x6f\x6e\x74\x72\x6f\x6c\x3a\x20\x6d
\x61\x78\x2d\x61\x67\x65\x3d\x30\xd\xa\xd\xa
如您所见,当以十六进制格式输出时,行以0D和0A结束,而当以原始字符串格式时,它们以\ xd和\ xa结尾。我的问题仍然存在,如何在将数据作为字符串处理时(或不能我)找到这些行尾字符?
答案 0 :(得分:0)
以下程序的输出为35
#include <iostream>
using namespace std;
int main()
{
string hdr = "Date: Wed, 23 Oct 2013 02:20:30 GMT\r\nServer: Apache\r\n";
int line_end_pos = hdr.find("\r\n");
cout << line_end_pos;
}
如果我们然后修改此代码,现在就是:
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
string hdr = "Date: Wed, 23 Oct 2013 02:20:30 GMT\r\nServer: Apache\r\n";
int line_end_pos = hdr.find("\r\n");
cout << line_end_pos;
fstream output;
output.open("test.txt", std::fstream::out);
output << hdr;
output.close();
}
我们得到一个包含hdr内容的文件。在使用十六进制编辑器查看它时,可以看到输入的某些转换已经发生。在GMT
和Server
之间,我们期望看到两个字符 - 0x0D和0x0A。但是,我们看到test.txt实际上有3个字符 - 0x0D,0x0D,0x0A。当输入字符串长度为53个字节(字符)时,文件的长度也是55个字节(字符)。
如果我们使用std::fstream::binary
按位或标记为std::fstream::out
,
output.open("test.txt", std::fstream::out | std::fstream::binary);
然后输出是hdr
中保存的字符串的相同副本。即53字节长,行之间单0x0d, 0x0a
。
编辑:此外,值得指出的是,unix和基于Windows的系统具有不同的行尾约定。我在windows下编写了这段代码。
Sooooo,我建议您保存标题的副本并使用十六进制编辑器进行检查 - 除非您这样做或使用调试器,否则您无法知道问题所在。我通常发现将文本输入视为二进制输入是最安全的 - 因为没有行尾字符的转换。
编辑2 :当你运行这个时,你得到26的结果吗?如果是这样的话,我恐怕刚才我没有想法。我早上新鲜时,我会进一步考虑你的问题。
#include <iostream>
using namespace std;
int main()
{
char rawData[] =
{
0x47,0x45,0x54,0x20, 0x2F,0x70,0x2F,0x6C, 0x69,0x62,0x63,0x72, 0x61,0x66,0x74,0x65,
0x72,0x2F,0x20,0x48, 0x54,0x54,0x50,0x2F, 0x31,0x2E,0x31,0x0D, 0x0A,0x48,0x6F,0x73,
0x74,0x3A,0x20,0x63, 0x6F,0x64,0x65,0x2E, 0x67,0x6F,0x6F,0x67, 0x6C,0x65,0x2E,0x63
};
string hdr = rawData;
int newLinePos = hdr.find("\r\n");
cout << newLinePos;
}