我使用CURL和C ++来获取网站的源代码,我将内容放在带有函数的字符串中,但是我得到额外的数据(0和几个新行)
这是我的代码(这不是整个代码,因为项目有点大)
这是获取内容/将其放入字符串
的功能size_t writefunc(void *ptr, size_t size, size_t nmemb, string pContent)
{
pContent += (char *)ptr;
return size*nmemb;
}
以下是我初始化CURL对象的方法
string Content;
CURL *pCURL = curl_easy_init();
if(!pCURL)
{
cout << "Couldn't create a curl object" << endl;
return 0;
}
curl_easy_setopt(pCURL, CURLOPT_WRITEFUNCTION, writefunc);
curl_easy_setopt(pCURL, CURLOPT_FOLLOWLOCATION, true);
curl_easy_setopt(pCURL, CURLOPT_COOKIEJAR, "cookie_file.txt");
curl_easy_setopt(pCURL, CURLOPT_WRITEDATA, &Content);
curl_easy_setopt(pCURL, CURLOPT_POST, true);
答案 0 :(得分:1)
在OSX El-Capitan 10.11.4上测试 - Xcode 7.3。作品。
-
注意:强>
如果您需要SSL连接,只需添加#define USE_SSL
并更改验证(CURLOPT_SSL_VERIFYPEER
和CURLOPT_SSL_VERIFYHOST
),如果您需要保证对等方或主机具有适当的证书。
您也不需要我在下面的代码中指定的很多选项。
编辑:我在您的代码中看到了问题。您正在执行POST
请求。你真正想要的是GET
请求,因为你想获得网页的来源。
//
// main.cpp
// TestCurl
//
// Created by Brandon T on 2016-04-21.
// Copyright © 2016 XIO. All rights reserved.
//
#include <iostream>
#include <curl/curl.h>
size_t writefunc(void *contents, size_t size, size_t nmemb, void *userp)
{
std::string *page_source = static_cast<std::string *>(userp);
if (page_source)
{
page_source->append(static_cast<char *>(contents), size * nmemb);
}
return size * nmemb;
}
int main(int argc, const char * argv[])
{
std::string page_url = "http://stackoverflow.com/questions/36757217/data-gets-added-to-curls-retrieved-content?noredirect=1#comment61096734_36757217";
std::string user_agent = "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36";
std::string page_source;
CURL *curl_handle = curl_easy_init();
if (curl_handle)
{
curl_easy_setopt(curl_handle, CURLOPT_FAILONERROR, 1L);
curl_easy_setopt(curl_handle, CURLOPT_USERAGENT, user_agent.c_str());
curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, writefunc);
curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, &page_source);
curl_easy_setopt(curl_handle, CURLOPT_AUTOREFERER, 1L);
#ifdef USE_SSL
curl_easy_setopt(curl_handle, CURLOPT_USE_SSL, CURLUSESSL_TRY);
curl_easy_setopt(curl_handle, CURLOPT_SSL_VERIFYPEER, 0L); //2L
curl_easy_setopt(curl_handle, CURLOPT_SSL_VERIFYHOST, 0L); //2L
#endif
curl_easy_setopt(curl_handle, CURLOPT_COOKIEJAR, "cookies.txt");
curl_easy_setopt(curl_handle, CURLOPT_COOKIEFILE, "cookies.txt");
curl_easy_setopt(curl_handle, CURLOPT_VERBOSE, 1L);
curl_easy_setopt(curl_handle, CURLOPT_URL, page_url.c_str());
curl_easy_setopt(curl_handle, CURLOPT_UPLOAD, 0L);
curl_easy_setopt(curl_handle, CURLOPT_POST, 0L);
CURLcode res = curl_easy_perform(curl_handle);
if (res != CURLE_OK)
{
std::string error_message = curl_easy_strerror(res);
curl_easy_cleanup(curl_handle);
std::cerr << error_message;
return 0;
}
curl_easy_cleanup(curl_handle);
std::cout << page_source;
}
return 0;
}
<强>结果:强>
* Trying 104.16.35.249...
* Connected to stackoverflow.com (104.16.35.249) port 80 (#0)
> GET /questions/36757217/data-gets-added-to-curls-retrieved-content?noredirect=1 HTTP/1.1
Host: stackoverflow.com
User-Agent: Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36
Accept: */*
Cookie: __cfduid=ddbf5d3e848c27dcbc1fded421106e2311461286187; prov=b57e8199-4ea1-4ad3-a9bb-cad71f707835
< HTTP/1.1 200 OK
< Date: Fri, 22 Apr 2016 00:59:16 GMT
< Content-Type: text/html; charset=utf-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< Cache-Control: public, max-age=60
< Expires: Fri, 22 Apr 2016 01:00:16 GMT
< Last-Modified: Fri, 22 Apr 2016 00:59:16 GMT
< Vary: *
< X-Frame-Options: SAMEORIGIN
< X-Request-Guid: 36d372e7-2da8-4c7f-ab22-ecd8cb96fa39
< Server: cloudflare-nginx
< CF-RAY: 297521d23dac016a-ORD
<
* Connection #0 to host stackoverflow.com left intact
<!DOCTYPE html>
<html itemscope itemtype="http://schema.org/QAPage">
<head>
加上此页面的源代码。