我有一个c ++程序,它下载网页的HTML代码并使用库LibCurl将其保存为文本文件。问题是,我们在丹麦语字母中有以下奇怪的字母:ÆæØøÅåå。 当我尝试逐行读取HTML代码时,所有这些字符看起来都像“ ”。我试图将文件读/写为宽字符。那没用。 我还尝试将包含“æ”,“ø”og“å”的句子写入另一个文本文件并再次阅读。由于某种原因,这是有效的。
所以我的问题是,为什么奇怪的字母在下载的HTML代码中看起来像“ ”而在我写自己的句子时却不是?我如何修复HTML输出?
我的代码如下:
#include <iostream>
#include <string>
#include<fstream>
#include<curl/curl.h>
using namespace std;
static size_t write_data (string * ptr, size_t size, size_t nmemb,void *stream)
{
size_t written = fwrite (ptr, size, nmemb, (FILE *) stream);
cout << static_cast <const void *> (ptr);
return written;
//string myString (ptr, nbytes);
}
//string myString (ptr, nbytes);
int main ()
{
// Writing weird characters to a text-file!
ofstream myfile;
myfile.open ("example.txt");
myfile << "Print strange letter: æ ø og å!";
myfile.close ();
// Reading weird characters from the text-file!
std::ifstream wif ("example.txt");
if (wif.is_open ())
{
std::string wline;
while (std::getline (wif, wline))
{
cout << wline << endl;
}
}
else
cout << "Could not open file" << endl ;
wif.close ();
cout << endl << endl;
// Download the HTML-code from a webpage and save it.
CURL *curl_handle;
static const char *pagefilename = "example2.txt";
const char *charUrl = "http://politiken.dk/forbrugogliv/sundhedogmotion/ECE3406716/mindst-85000-offentligt-ansatte-maa-slet-ikke-ryge-i-arbejdstiden/"; // An article from the danish newspaper "Politiken"
FILE *pagefile;
curl_global_init (CURL_GLOBAL_ALL);
curl_handle = curl_easy_init ();
curl_easy_setopt (curl_handle, CURLOPT_URL, charUrl); // HERE IS THE URL PASSED!
curl_easy_setopt (curl_handle, CURLOPT_VERBOSE, 0L);
curl_easy_setopt (curl_handle, CURLOPT_NOPROGRESS, 0L);
curl_easy_setopt (curl_handle, CURLOPT_WRITEFUNCTION, write_data);
pagefile = fopen (pagefilename, "wb");
if (pagefile)
{
curl_easy_setopt (curl_handle, CURLOPT_WRITEDATA, pagefile);
curl_easy_perform (curl_handle);
fclose (pagefile);
}
curl_easy_cleanup (curl_handle);
//Reading the HTML-code
ifstream webIn ("example2.txt");
if (webIn.is_open ())
{
std::string wline;
while (getline (webIn, wline))
{
cout << wline << endl;
}
}
else
cout << "Could not open example.txt" << endl;
return 0;
}
我的输出是
Print strange letter: æ ø og å!
>>HTML-CODE CONTAINING "�" instead of æ ø å <<
我不知道它是否相关,但我的操作系统是Linux Mint 17.3并且我已将我的系统的语言和区域设置为“英语,丹麦UTF-8”。
提前致谢!我将非常感谢任何帮助或提示:)