数据会添加到curl检索到的内容中

时间:2016-04-20 23:53:13

标签: c++ curl

我使用CURL和C ++来获取网站的源代码,我将内容放在带有函数的字符串中,但是我得到额外的数据(0和几个新行)

这是我的代码(这不是整个代码,因为项目有点大)

这是获取内容/将其放入字符串

的功能
size_t writefunc(void *ptr, size_t size, size_t nmemb, string pContent)
{
    pContent += (char *)ptr;
    return size*nmemb;
}

以下是我初始化CURL对象的方法

string Content;
CURL *pCURL = curl_easy_init();
if(!pCURL)
{
    cout << "Couldn't create a curl object" << endl;
    return 0;
}
curl_easy_setopt(pCURL, CURLOPT_WRITEFUNCTION, writefunc);
curl_easy_setopt(pCURL, CURLOPT_FOLLOWLOCATION, true);
curl_easy_setopt(pCURL, CURLOPT_COOKIEJAR, "cookie_file.txt");
curl_easy_setopt(pCURL, CURLOPT_WRITEDATA, &Content);
curl_easy_setopt(pCURL, CURLOPT_POST, true);

1 个答案:

答案 0 :(得分:1)

在OSX El-Capitan 10.11.4上测试 - Xcode 7.3。作品。

-

注意: 如果您需要SSL连接,只需添加#define USE_SSL并更改验证(CURLOPT_SSL_VERIFYPEERCURLOPT_SSL_VERIFYHOST),如果您需要保证对等方或主机具有适当的证书。

您也不需要我在下面的代码中指定的很多选项。

编辑:我在您的代码中看到了问题。您正在执行POST请求。你真正想要的是GET请求,因为你想获得网页的来源。

//
//  main.cpp
//  TestCurl
//
//  Created by Brandon T on 2016-04-21.
//  Copyright © 2016 XIO. All rights reserved.
//

#include <iostream>
#include <curl/curl.h>


size_t writefunc(void *contents, size_t size, size_t nmemb, void *userp)
{
    std::string *page_source = static_cast<std::string *>(userp);

    if (page_source)
    {
        page_source->append(static_cast<char *>(contents), size * nmemb);
    }

    return size * nmemb;
}

int main(int argc, const char * argv[])
{

    std::string page_url = "http://stackoverflow.com/questions/36757217/data-gets-added-to-curls-retrieved-content?noredirect=1#comment61096734_36757217";

    std::string user_agent = "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36";

    std::string page_source;


    CURL *curl_handle = curl_easy_init();
    if (curl_handle)
    {
        curl_easy_setopt(curl_handle, CURLOPT_FAILONERROR, 1L);
        curl_easy_setopt(curl_handle, CURLOPT_USERAGENT, user_agent.c_str());
        curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, writefunc);
        curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, &page_source);
        curl_easy_setopt(curl_handle, CURLOPT_AUTOREFERER, 1L);

        #ifdef USE_SSL
        curl_easy_setopt(curl_handle, CURLOPT_USE_SSL, CURLUSESSL_TRY);
        curl_easy_setopt(curl_handle, CURLOPT_SSL_VERIFYPEER, 0L);  //2L
        curl_easy_setopt(curl_handle, CURLOPT_SSL_VERIFYHOST, 0L);  //2L
        #endif

        curl_easy_setopt(curl_handle, CURLOPT_COOKIEJAR, "cookies.txt");
        curl_easy_setopt(curl_handle, CURLOPT_COOKIEFILE, "cookies.txt");

        curl_easy_setopt(curl_handle, CURLOPT_VERBOSE, 1L);
        curl_easy_setopt(curl_handle, CURLOPT_URL, page_url.c_str());
        curl_easy_setopt(curl_handle, CURLOPT_UPLOAD, 0L);
        curl_easy_setopt(curl_handle, CURLOPT_POST, 0L);

        CURLcode res = curl_easy_perform(curl_handle);

        if (res != CURLE_OK)
        {
            std::string error_message = curl_easy_strerror(res);
            curl_easy_cleanup(curl_handle);

            std::cerr << error_message;
            return 0;
        }

        curl_easy_cleanup(curl_handle);

        std::cout << page_source;
    }


    return 0;
}

<强>结果:

*   Trying 104.16.35.249...
* Connected to stackoverflow.com (104.16.35.249) port 80 (#0)
> GET /questions/36757217/data-gets-added-to-curls-retrieved-content?noredirect=1 HTTP/1.1
Host: stackoverflow.com
User-Agent: Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36
Accept: */*
Cookie: __cfduid=ddbf5d3e848c27dcbc1fded421106e2311461286187; prov=b57e8199-4ea1-4ad3-a9bb-cad71f707835

< HTTP/1.1 200 OK
< Date: Fri, 22 Apr 2016 00:59:16 GMT
< Content-Type: text/html; charset=utf-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< Cache-Control: public, max-age=60
< Expires: Fri, 22 Apr 2016 01:00:16 GMT
< Last-Modified: Fri, 22 Apr 2016 00:59:16 GMT
< Vary: *
< X-Frame-Options: SAMEORIGIN
< X-Request-Guid: 36d372e7-2da8-4c7f-ab22-ecd8cb96fa39
< Server: cloudflare-nginx
< CF-RAY: 297521d23dac016a-ORD
< 
* Connection #0 to host stackoverflow.com left intact
<!DOCTYPE html>
<html itemscope itemtype="http://schema.org/QAPage">
<head>

加上此页面的源代码。