将西里尔数据从Windows-1251解码为ISO-8859-1

时间:2014-03-12 21:48:04

标签: encoding cyrillic

我有一个旧数据库,其中包含一些包含西里尔数据的列,这些数据无法读取且需要转换。作为一个试验,我写了下面的代码,但结果不是我所期望的。有人可以指出问题和/或建议如何转换数据?

using System;
using System.Text;
using System.Windows.Forms;

namespace ConvertEncoding
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }
        Encoding cp1251 = Encoding.GetEncoding("windows-1251");
        Encoding iso8859 = Encoding.GetEncoding("iso-8859-1");

        private void button1_Click(object sender, EventArgs e)
        {
            byte[] cp1251Bytes = cp1251.GetBytes("Ñîáëþäåíèå ïðàâ äåòåé â äåòñêèõ äîìàõ Êûðãûçñêîé Ðåñïóáëèêè");
            byte[] iso8859Bytes = Encoding.Convert(cp1251, iso8859, cp1251Bytes);
            string iso8859String = iso8859.GetString(iso8859Bytes);
            label1.Text = iso8859String;
            // Sample Cyrillic text should convert to: 
            // Соблюдение прав детей в детских домах Кыргызской Республики
        }
    }
}

1 个答案:

答案 0 :(得分:0)

很明显"Ñîáëþäåíèå ïðàâ äåòåé â äåòñêèõ äîìàõ Êûðãûçñêîé Ðåñïóáëèêè" iso-8859-1 - 编码字符串。你应该将它转换为 windows-1251 ,但你正在做相反的事情。

只需在cp1251内的任何地方切换iso8859button1_Click,您就会看到正确的结果。

也许您想将数据用作unicode,然后将其转换为

Encoding utf8 = Encoding.GetEncoding("utf-8");
Encoding iso8859 = Encoding.GetEncoding("iso-8859-1");
Encoding cp1251 = Encoding.GetEncoding("windows-1251");

private void button1_Click(object sender, EventArgs e)
{
    byte[] bytes = iso8859.GetBytes("Ñîáëþäåíèå ïðàâ äåòåé â äåòñêèõ äîìàõ Êûðãûçñêîé Ðåñïóáëèêè"); // get bytes in source encoding
    // but they are actually cp1251 so...
    string utf8string = utf8.GetString(Encoding.Convert(cp1251, utf8, bytes)); // convert them from cp1251 to utf8
    label1.Text = utf8string;
}