使用libcurl编写和读取奇怪的字母到txt文件

时间:2016-10-21 21:14:31

标签: html c++ unicode libcurl

我有一个c ++程序,它下载网页的HTML代码并使用库LibCurl将其保存为文本文件。问题是,我们在丹麦语字母中有以下奇怪的字母:ÆæØøÅåå。 当我尝试逐行读取HTML代码时,所有这些字符看起来都像“ ”。我试图将文件读/写为宽字符。那没用。 我还尝试将包含“æ”,“ø”og“å”的句子写入另一个文本文件并再次阅读。由于某种原因,这是有效的。

所以我的问题是,为什么奇怪的字母在下载的HTML代码中看起来像“ ”而在我写自己的句子时却不是?我如何修复HTML输出?

我的代码如下:

#include <iostream>
#include <string>
#include<fstream>
#include<curl/curl.h>

using namespace std;


static size_t write_data (string * ptr, size_t size, size_t nmemb,void *stream)
{
 size_t written = fwrite (ptr, size, nmemb, (FILE *) stream);
  cout << static_cast <const void *> (ptr);
  return written;
  //string myString (ptr, nbytes);
}
//string myString (ptr, nbytes);
int main ()
{
// Writing weird characters to a text-file!
ofstream myfile;
myfile.open ("example.txt");
myfile << "Print strange letter: æ ø og å!";
myfile.close ();

// Reading weird characters from the text-file!
std::ifstream wif ("example.txt");
if (wif.is_open ())
{
  std::string wline;
  while (std::getline (wif, wline))
{
  cout << wline << endl;

}
}
else
cout << "Could not open file" << endl ;
wif.close ();

cout << endl << endl;


// Download the HTML-code from a webpage and save it.
CURL *curl_handle;
static const char *pagefilename = "example2.txt";
const char *charUrl = "http://politiken.dk/forbrugogliv/sundhedogmotion/ECE3406716/mindst-85000-offentligt-ansatte-maa-slet-ikke-ryge-i-arbejdstiden/"; // An article from the danish newspaper "Politiken"

FILE *pagefile;
curl_global_init (CURL_GLOBAL_ALL);
curl_handle = curl_easy_init ();
curl_easy_setopt (curl_handle, CURLOPT_URL, charUrl);   // HERE IS THE URL PASSED!
curl_easy_setopt (curl_handle, CURLOPT_VERBOSE, 0L);
curl_easy_setopt (curl_handle, CURLOPT_NOPROGRESS, 0L);
curl_easy_setopt (curl_handle, CURLOPT_WRITEFUNCTION, write_data);
pagefile = fopen (pagefilename, "wb");

if (pagefile)
{
  curl_easy_setopt (curl_handle, CURLOPT_WRITEDATA, pagefile);
  curl_easy_perform (curl_handle);
  fclose (pagefile);
}
curl_easy_cleanup (curl_handle);

//Reading the HTML-code
ifstream webIn ("example2.txt");
if (webIn.is_open ())
{
  std::string wline;
  while (getline (webIn, wline))
{
  cout << wline << endl;
}
}
else
cout << "Could not open example.txt" << endl;



return 0;
}

我的输出是

Print strange letter: æ ø og å!


>>HTML-CODE CONTAINING "�" instead of æ ø å <<

我不知道它是否相关,但我的操作系统是Linux Mint 17.3并且我已将我的系统的语言和区域设置为“英语,丹麦UTF-8”。

提前致谢!我将非常感谢任何帮助或提示:)

0 个答案:

没有答案