WinHTTP更改下载文件的文本编码

时间:2019-05-26 10:45:28

标签: download character-encoding winhttp

我正在使用下面的简单C程序测试WinHTTP函数;

#define WINDOWS_LEAN_AND_MEAN
#include <windows.h>
#include <stdio.h>
#include <winhttp.h>

#pragma comment(lib,"winhttp")

int WINAPI wWinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
    PWSTR pCmdLine, int nCmdShow)
{
    HINTERNET hSession = NULL,
        hConnect = NULL,
        hRequest = NULL;

    hSession = WinHttpOpen(L"Agent Smith",
        WINHTTP_ACCESS_TYPE_AUTOMATIC_PROXY,
        WINHTTP_NO_PROXY_NAME,
        WINHTTP_NO_PROXY_BYPASS,
        0);

    FILE *fpOutput = fopen("download.html", "w");

    if (fpOutput != NULL)
    {
        if (hSession)
        {
            hConnect = WinHttpConnect(hSession,
                L"www.google.com.tr",
                INTERNET_DEFAULT_HTTPS_PORT, 0);

            if (hConnect)
            {
                hRequest = WinHttpOpenRequest(hConnect,
                    L"GET",
                    NULL,
                    NULL,
                    WINHTTP_NO_REFERER,
                    WINHTTP_DEFAULT_ACCEPT_TYPES,
                    WINHTTP_FLAG_SECURE);

                if (hRequest)
                {
                    BOOL bResult = FALSE;
                    bResult = WinHttpSendRequest(hRequest,
                        WINHTTP_NO_ADDITIONAL_HEADERS,
                        0, WINHTTP_NO_REQUEST_DATA, 0,
                        0, 0);

                    if (bResult)
                    {
                        bResult = WinHttpReceiveResponse(hRequest, NULL);
                        if (bResult)
                        {
                            char buffer[4096];
                            DWORD downloaded;
                            while (WinHttpReadData(hRequest, buffer, 4096, &downloaded))
                            {
                                if (downloaded == 0)
                                    break;

                                fwrite(buffer, 1, downloaded, fpOutput);
                            }
                        }
                    }

                    WinHttpCloseHandle(hRequest);
                }
                WinHttpCloseHandle(hConnect);
            }
            WinHttpCloseHandle(hSession);
        }
        fclose(fpOutput);
    }


}

但是,当我在Web浏览器中检查下载的文件时,可以看到该文件是用错误的编码下载的。看来文件是ANSI编码的,而根据meta html标签应该是utf8。在文档中,我看不到WinHTTP进行自动文本编码转换的任何地方。如果可以,我如何禁用它。如果不是,那么可能是此问题的原因?

编辑:

当我使用相同的代码下载图像时,它显示正确,因此我认为winhttp不会影响下载资源的二进制完整性。为什么下载损坏的html文件仍然是一个谜。

0 个答案:

没有答案