我只是试图获取带有标题的页面内容......但似乎我的1024大小的缓冲区太大或太小而无法通过最后一批信息...我不知道如果这有意义的话,我想要得太多或太少。这是我的代码。它正好打印出所有信息的页面,但我想确保它是正确的。
//Build HTTP Get Request
std::stringstream ss;
ss << "GET " << url << " HTTP/1.0\r\nHost: " << strHostName << "\r\n\r\n";
std::string req = ss.str();
// Send Request
send(hSocket, req.c_str(), strlen(req.c_str()), 0);
// Read from socket into buffer.
do
{
nReadAmount = read(hSocket, pBuffer, sizeof pBuffer);
printf("%s", pBuffer);
}
while(nReadAmount != 0);
答案 0 :(得分:2)
nReadAmount = read(hSocket, pBuffer, sizeof pBuffer);
printf("%s", pBuffer);
这已经破了。您只能将%s
格式说明符用于C样式(零终止)字符串。 printf
如何知道要打印多少字节?该信息位于nReadAmount
,但您不使用它。
此外,即使printf
失败,也请致电read
。
最简单的解决方法:
do
{
nReadAmount = read(hSocket, pBuffer, (sizeof pBuffer) - 1);
if (nReadAmount <= 0)
break;
pBuffer[nReadAmount] = 0;
printf("%s", pBuffer);
} while(1);
答案 1 :(得分:1)
阅读HTTP回复的正确方法是阅读,直到您收到完整的LF
分隔行(某些服务器使用bare LF
,即使官方规范要求使用CRLF
),其中包含响应代码和版本,然后继续读取LF分隔的行,这是标题,直到遇到0长度行,指示标题的结尾,然后你必须分析标题以弄清楚如何剩余的数据经过编码,因此您知道正确的读取方式并知道它是如何终止的。有几种不同的可能性,请参考RFC 2616 Section 4.4了解实际规则。
换句话说,您的代码需要使用这种结构(伪代码):
// Send Request
send(hSocket, req.c_str(), req.length(), 0);
// Read Response
std::string line = ReadALineFromSocket(hSocket);
int rescode = ExtractResponseCode(line);
std::vector<std::string> headers;
do
{
line = ReadALineFromSocket(hSocket);
if (line.length() == 0) break;
headers.push_back(line);
}
while (true);
if (
((rescode / 100) != 1) &&
(rescode != 204) &&
(rescode != 304) &&
(request is not "HEAD")
)
{
if ((headers has "Transfer-Encoding") && (Transfer-Encoding != "identity"))
{
// read chunks until a 0-length chunk is encountered.
// refer to RFC 2616 Section 3.6 for the format of the chunks...
}
else if (headers has "Content-Length")
{
// read how many bytes the Content-Length header says...
}
else if ((headers has "Content-Type") && (Content-Type == "multipart/byteranges"))
{
// read until the terminating MIME boundary specified by Content-Type is encountered...
}
else
{
// read until the socket is disconnected...
}
}