套接字缓冲区中恼人的NUL块

时间:2015-02-23 00:19:42

标签: c++ string sockets null buffer

我目前正在尝试在mac上编写c ++代码,从网站下载更大的文件(~1GB)。我想我有一个错误,我将套接字缓冲区转换为字符串,导致我的结果文件(电影文件)有一些小块的nul字符遍布整个文件,我需要以某种方式从字符串optained中删除它们套接字缓冲区。

这是处理http连接的部分以及将日期保存到文件的部分。某些部件可能不在此示例中,如错误处理或完整的套接字构建。

//I have error handling in here but stripped out from this example
char buffer[512];
portno = atoi("8080");
sockfd = socket(AF_INET, SOCK_STREAM, 0);
server = gethostbyname(address);

bzero((char *) &serv_addr, sizeof(serv_addr));

serv_addr.sin_family = AF_INET;

bcopy((char *)server->h_addr,
      (char *)&serv_addr.sin_addr.s_addr,
      server->h_length);

serv_addr.sin_port = htons(portno);

bzero(buffer,512);
header.copy(buffer,512);

n = write(sockfd,buffer,strlen(buffer));

std::string str_buff;

while((n = read(sockfd,buffer,511)) > 0){

    std::string temp(buffer,511);
    //Is this the error^^^^^^^^^?

    write_chunk_to_file(temp);
    //cut



void write_chunk_to_file(std::string chunk){
   write.open(path+fname, std::ios::out | std::ios::app);

   write << remove_header(chunk);

   write.close();
   //cut




std::string remove_header(std::string chunk){

   if(chunk.find("")){
       chunk = chunk.substr(chunk.find(""),chunk.length());
   }

   return chunk;

}

当我将我的代码下载的文件与文件wget downloads进行比较时,我的文件中只有一些NUL字符组成的较小的块,而且我的文件中也存在一些额外的字节。

有没有人有线索?

2 个答案:

答案 0 :(得分:0)

是的,您指出的行是错误:

std::string temp(buffer,511);
//Is this the error^^^^^^^^^?

read()返回实际读入缓冲区的字节数。你需要考虑到这一点:

std::string temp(buffer,n);

此外,您正在阅读原始数据,因此remove_header()不属于write_chunk_to_file()。缓冲区可以包含多个标题和/或主体的数据部分。您需要实现一个正确的HTTP解析器,以便您可以检测每个标头的结束位置,正文开始的位置,正文结束的位置以及正文的编码方式。然后你可以只将身体数据写入你的文件。

此代码甚至无法正确读取HTTP响应。你需要更像这样实现逻辑(我把它作为练习让你用C ++实现它):

send request
while true:
    read line
    if not successful:
        throw error
    if line is blank:
        break while loop
    add line to headers list
parse headers list
if response can contain message body:
    if HTTP version is 1.1+, and Transfer-Encoding header is present and not "identity":
        while true:
            read line, extract delimited ASCII hexadecimal for the chunk size
            if not successful:
                throw error
             if chunk size is 0:
                break while loop
             read chunk size number of bytes
        while true:
            read line
            if not successful:
                throw error
            if line is blank:
                break while loop
            add line to headers list, replace existing header if needed
        parse headers list again
    else if Content-Length header is specified:
        read Content-Length number of bytes
    else if Content-Type header is "multipart/byteranges":
        read and parse MIME-encoded chunks until terminating MIME boundary is reached
    else:
        read until connection is closed

答案 1 :(得分:0)

好了,现在改变了以下一行解决了它:

std::string temp(buffer,511);
//changed to:
std::string temp(buffer,n);

当我复制511字节时,我真的得到了“更多”,我只需要从socket读取read()读取的n个字节,感谢提示人员:D