我正在尝试使用C ++将完整的网页保存为.txt文件(Visual Studio 2013)。我正在使用cURL。 一切正常,但我正在尝试保存的网站 - 使用大量的JavaScript来生成页面。因此,当我使用cURL保存网页时,.txt文件只有~170行。 当我使用谷歌浏览器(ctrl + s)将网页保存到.htm文件时,.htm文件有超过2000行。有没有办法将完全加载的网页保存到文件? 这是我正在使用的代码:
struct MemoryStruct {
char *memory;
size_t size;
};
static size_t
WriteMemoryCallback(void *contents, size_t size, size_t nmemb, void *userp)
{
size_t realsize = size * nmemb;
struct MemoryStruct *mem = (struct MemoryStruct *)userp;
mem->memory = (char*)realloc(mem->memory, mem->size + realsize + 1);
if (mem->memory == NULL) {
/* out of memory! */
printf("not enough memory (realloc returned NULL)\n");
return 0;
}
memcpy(&(mem->memory[mem->size]), contents, realsize);
mem->size += realsize;
mem->memory[mem->size] = 0;
return realsize;
}
int main(void)
{
CURL *curl_handle;
CURLcode res;
struct MemoryStruct chunk;
chunk.memory = (char*)malloc(1); /* will be grown as needed by the realloc above */
chunk.size = 0; /* no data at this point */
curl_global_init(CURL_GLOBAL_ALL);
/* init the curl session */
curl_handle = curl_easy_init();
/* specify URL to get */
curl_easy_setopt(curl_handle, CURLOPT_URL, "http://www.example.com/");
/* send all data to this function */
curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, WriteMemoryCallback);
/* we pass our 'chunk' struct to the callback function */
curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, (void *)&chunk);
/* some servers don't like requests that are made without a user-agent
field, so we provide one */
curl_easy_setopt(curl_handle, CURLOPT_USERAGENT, "libcurl-agent/1.0");
/* get it! */
res = curl_easy_perform(curl_handle);
/* check for errors */
if (res != CURLE_OK) {
fprintf(stderr, "curl_easy_perform() failed: %s\n",
curl_easy_strerror(res));
}
else {
/*
* Now, our chunk.memory points to a memory block that is chunk.size
* bytes big and contains the remote file.
*
* Do something nice with it!
*/
printf("%lu bytes retrieved\n", (long)chunk.size);
}
std::ofstream oplik;
oplik.open("test.txt");
oplik << chunk.memory;
oplik.close();
/* cleanup curl stuff */
curl_easy_cleanup(curl_handle);
if (chunk.memory)
free(chunk.memory);
/* we're done with libcurl, so clean it up */
curl_global_cleanup();
return 0;
}
感谢您的帮助,对不起我的英语不好。
答案 0 :(得分:1)
cURL只能保存Web服务器提供的内容。
如果您想要保存除此之外的任何内容,您必须包含一个javascript解释器来构建网页,就像任何Web浏览器一样。