动态分配内存以存储使用WinHttpReadData下载的HTML源

时间:2014-02-24 10:36:29

标签: c++ memory-management winhttp

首先,字符串不能,这是一项要求。

我正在尝试实现Winhttp以便从HTTP下载内容。我使用了MSDN(http://msdn.microsoft.com/en-us/library/windows/desktop/aa384270(v=vs.85).aspx)上提供的示例。

我相信你们当中有些人都知道WinHttpReadData()将数据读入临时缓冲区,而不是写入现有数据直到请求完成。如果您只想每次打印缓冲区,这很好,但是,我需要将整个响应存储到缓冲区中以供日后使用。

为此,我创建了一个包含执行请求所需的所有“事物”的结构,我通过引用传递此结构来执行请求的函数。 结构如下:

struct HttpG
{
    wchar_t*    wszUserAgent; 
    wchar_t*    wszCookie;
    wchar_t*    wszHost;
    wchar_t*    wszPath; 
    char*       szResponse;
};

执行请求的函数定义如下:

int HttpGet(HttpG &http_get);

到目前为止一直很好......

现在当我尝试为http_get.szResponse动态分配内存时出现问题。 并非所有数据都被读取。我不打算从MSDN发布整个示例代码,但我会发布给我问题的部分代码。如果您查看上面的MSDN链接,您将看到我所指的代码的哪一部分。这是下载数据的主循环。

// Read the Data.
ZeroMemory(szOutBuffer, dwSize + 1);

if(!WinHttpReadData(hRequest, (LPVOID)szOutBuffer, dwSize, &dwDownloaded))
{                                  
    OutputDebugStr("Error in WinHttpReadData\n");   
}
else
{
     // Read data here              
     if(http_get.szResponse == NULL)
     {                  
         // This part seems to work as needed
         http_get.szResponse = new char[dwSize + 1];        
         ZeroMemory(http_get.szResponse, dwSize + 1);
         strcpy(http_get.szResponse, szOutBuffer);
         http_get.szResponse[dwSize + 1] = '\0';            
     }              
     else
     {
         // Im sure the problems is here, full source
         // is not getting put into http_get.szResponse.

         // Create temp buffer
         szTemp = new char[strlen(http_get.szResponse) + 1];    
         ZeroMemory(szTemp, strlen(http_get.szResponse) + 1);
         strcat(szTemp, http_get.szResponse);                   

         // Resize origonal buffer to hold new data                 
         http_get.szResponse = new char[strlen(szTemp) + dwSize + 1];
         ZeroMemory(http_get.szResponse, strlen(szTemp) + dwSize + 1);
         strcpy(http_get.szResponse, szTemp);
         strcat(http_get.szResponse, szOutBuffer);
         http_get.szResponse[strlen(szTemp) + dwSize + 1] = '\0';               
     }              
}           

// Free the memory allocated to the buffer.
delete[] szTemp;
delete[] szOutBuffer;           

// This condition should never be reached since WinHttpQueryDataAvailable
// reported that there are bits to read.
if(!dwDownloaded)
{
    break;
}

我像这样创建结构和调用函数:

HttpG http_get;
http_get.wszHost = L"au.yahoo.com";
http_get.wszPath = L"/?p=us";
http_get.wszUserAgent = L"Blah blah blah";
http_get.szResponse = NULL;

HttpGet(http_get);  

所以基本上在请求结束时我希望所有数据都在http_get.szResponse中。对不起,如果这有点混乱/模糊,我试着尽可能地解释它。我究竟做错了什么? 一直坚持这一点,任何帮助都非常感谢。

谢谢你们。

3 个答案:

答案 0 :(得分:1)

即使是您认为正确的代码,也不是。您假设szOutBuffer以空值终止。阅读文档:&dwDownloaded参数是有原因的。

在“错误”代码中,您当然会遇到相同的错误。另外,你泄漏了旧的szResponse(正是因为你没有使用字符串类)。

然后通过以某种完全错误的方式移动字符串位来使情况变得更糟。您似乎追加对空字符串szTemp的旧响应(为什么?为什么?),将其复制回新分配的szResponse,然后追加(仍然不正确)大小)szOutBuffer

最后,您在\0

之外写下szResponse[]

样式问题:您incorrectly assuming strlen是免费的,或者至少是O(1)。

此代码是人们应该使用std::string的教科书示例。我强烈建议你不要修理它。使用字符串重写是唯一合理的操作。

答案 1 :(得分:1)

你需要在循环中调用`WinHttpReadData,并将下载的数据memcpy到另一个缓冲区以保存所有内容,直到你检索到整个响应。每次复制到缓冲区时都保持指向缓冲区末尾的指针。

这样的东西(极度过于简单,仅显示循环的基本结构):

char *myBuffer = malloc(bufSize);
char *bufPtr = myBuffer;
int totalBytes = 0;
while (!done)
{
    if (WinHttpReadData(hRequest, (LPVOID)outBuffer, dwSize, &dwDownloaded))
    {
        // if nothing left to download, we're done
        if (dwDownloaded == 0)
            done = true;
        else
        {
            // Might need to realloc() myBuffer here if you're going to pass the end of it
            if (myBuffer + totalBytes + dwDownloaded > bufSize);
                myBuffer = (char *)realloc(myBuffer, totalBytes + dwDownloaded);

            memcpy(bufPtr, outBuffer, dwDownloaded);
            bufPtr += dwDownloaded;
            totalBytes += dwDownloaded;
        }
    }
}

// Null terminate it so you can treat it like a C string.
*bufPtr = '\0';

// Now myBuffer contains the entire downloaded response as a null-terminated string.  Do whatever you want with it.
// Don't forget to free(myBuffer) when you're done with it.

注意:这不是一个有效的代码示例,可能包含错误甚至语法错误(我还没有测试过甚至编译过它)。它仅用于显示循环的基本结构,以完成提问者试图完成的任务。

答案 2 :(得分:1)

您必须在循环中调用WinHttpReadData(),直到没有更多数据要读取,并且您需要在该循环的每次迭代中动态(重新)分配响应缓冲区。如果要求禁止您使用std::string,那么他们可能也禁止您使用std::vector,因此您将不得不求助于手动内存管理,例如:

struct HttpG
{
    wchar_t*    wszUserAgent; 
    wchar_t*    wszCookie;
    wchar_t*    wszHost;
    wchar_t*    wszPath; 
    u_char*     ucResponse;
    int         ucResponseSize;
};

u_char ucBuffer[1024], *ucTemp;    
DWORD dwDownloaded;

do
{
    if (!WinHttpReadData(hRequest, ucBuffer, sizeof(ucBuffer), &dwDownloaded))
    {                                  
        OutputDebugStr("Error in WinHttpReadData\n");   
        break;
    }

    if (dwDownloaded == 0)
        break;

    if (http_get.ucResponse == NULL)
    {                  
        http_get.ucResponse = new u_char[dwDownloaded];        
        memcpy(http_get.ucResponse, ucBuffer, dwDownloaded);
        http_get.ucResponseSize = dwDownloaded;            
    }              
    else
    {
        ucTemp = new u_char[http_get.ucResponseSize + dwDownloaded];    
        memcpy(ucTemp, http_get.ucResponse, http_get.ucResponseSize);                   
        memcpy(&ucTemp[http_get.ucResponseSize], ucBuffer, dwDownloaded);                   

        delete[] http_get.ucResponse;
        http_get.ucResponse = ucTemp;               
        http_get.ucResponseSize += dwDownloaded;
    }              
}
while (true);           

HttpG http_get;
http_get.wszHost = L"au.yahoo.com";
http_get.wszPath = L"/?p=us";
http_get.wszUserAgent = L"Blah blah blah";
http_get.ucResponse = NULL;
http_get.ucResponseSize = 0;

HttpGet(http_get);

// use ucResponse up to ucResponseSize bytes as needed...

delete[] http_get.ucResponse;