当我在Sqlite浏览器中打开旧数据库时,文本显示错误。我可以设置的唯一编码是UTF-8和UTF-16
当我查询数据库时,Visual Studio中的编码已经错误了
我假设文本是用ANSI编码的(Windows-1252)(在评论中确认)。我尝试将其转换为UTF-8
var encoding = Encoding.GetEncoding(1252);
byte[] encBytes = encoding.GetBytes(result);
byte[] utf8Bytes = Encoding.Convert(encoding, Encoding.UTF8, encBytes);
return Encoding.UTF8.GetString(utf8Bytes);
不知何故,外部遗留应用程序正确显示它,所以似乎有办法。但我不确定接下来我能尝试什么。
答案 0 :(得分:3)
我曾遇到过同样的问题,
John Skeet回答here:
基本上取字符串,获取编码为的错误编码的字节,然后在编码中得到它真正的字符串:
string broken = "Brokers México, Intermediario de Aseguro,S.A."; // Get text from database
byte[] encoded = Encoding.GetEncoding(28591).GetBytes(broken);
string corrected = Encoding.UTF8.GetString(encoded);
所以你应该只是
string broken = "Whatever";
byte[] encoded = Encoding.GetEncoding(1252).GetBytes(broken);
string corrected = Encoding.UTF8.GetString(encoded);
基本上,既然您知道重新转换程序是正确的,那么我会玩这里提到的编码:
https://msdn.microsoft.com/en-us/library/system.text.encodinginfo.getencoding(v=vs.110).aspx
(只需编写一个程序来测试那里列出的所有可能的可能性,看看哪一对会产生匹配......)
如果你知道源文本,你甚至可以自动执行检查:
public partial class Form1 : Form
{
public System.Data.DataTable dt;
public Form1()
{
InitializeComponent();
}
private void btnTest_Click(object sender, EventArgs e)
{
dt = new System.Data.DataTable();
string correct = "Brokers México, Intermediario de Aseguro,S.A.";
string broken = "Brokers México, Intermediario de Aseguro,S.A."; // Get text from database
dt.Columns.Add("SourceEncoding", typeof(string));
dt.Columns.Add("TargetEncoding", typeof(string));
dt.Columns.Add("Result", typeof(string));
dt.Columns.Add("SourceEncodingName", typeof(string));
dt.Columns.Add("TargetEncodingName", typeof(string));
// For reference
// https://msdn.microsoft.com/en-us/library/system.text.encodinginfo.getencoding(v=vs.110).aspx
int[] encs = new int[] {
20127 // US-ASCII
,28591 // iso-8859-1 Western European (ISO)
,28592 // iso-8859-2 Central European (ISO)
,28593 // iso-8859-3 Latin 3 (ISO)
,28594 // iso-8859-4 Baltic (ISO)
,28595 // iso-8859-5 Cyrillic (ISO)
,28596 // iso-8859-6 Arabic (ISO)
,28597 // iso-8859-7 Greek (ISO)
,28598 // iso-8859-8 Hebrew (ISO-Visual)
,28599 // iso-8859-9 Turkish (ISO)
,28603 // iso-8859-13 Estonian (ISO)
,28605 // iso-8859-15 Latin 9 (ISO)
,1250 // windows-1250 Central European (Windows)
,1251 // windows-1251 Cyrillic (Windows)
,1252 // Windows-1252 Western European (Windows)
,1253 // windows-1253 Greek (Windows)
,1254 // windows-1254 Turkish (Windows)
,1255 // windows-1255 Hebrew (Windows)
,1256 // windows-1256 Arabic (Windows)
,1257 // windows-1257 Baltic (Windows)
,1258 // windows-1258 Vietnamese (Windows)
,20866 // Cyrillic (KOI8-R)
,21866 // Cyrillic (KOI8-U)
,65000 // UTF-7
,65001 // UTF-8
,1200 // UTF-16
,1201 // Unicode (Big-Endian)
,12000 // UTF-32
,12001 // UTF-32BE (UTF-32 Big-Endian)
};
for (int i = 0; i < encs.Length; ++i)
{
for (int j = 0; j < encs.Length; ++j)
{
System.Data.DataRow dr = dt.NewRow();
dr["SourceEncoding"] = encs[i];
dr["TargetEncoding"] = encs[j];
System.Text.Encoding enci = Encoding.GetEncoding(encs[i]);
System.Text.Encoding encj = Encoding.GetEncoding(encs[j]);
byte[] encoded = enci.GetBytes(broken);
string corrected = encj.GetString(encoded);
dr["Result"] = corrected;
dr["SourceEncodingName"] = enci.BodyName;
dr["TargetEncodingName"] = encj.BodyName;
if (StringComparer.InvariantCultureIgnoreCase.Equals(correct, corrected))
dt.Rows.Add(dr);
}
}
this.dataGridView1.DataSource = dt;
}
}
或者甚至更彻底,只测试所有编码:
private void btnTestAll_Click(object sender, EventArgs e)
{
dt = new System.Data.DataTable();
string correct = "Brokers México, Intermediario de Aseguro,S.A.";
string broken = "Brokers México, Intermediario de Aseguro,S.A."; // Get text from database
dt.Columns.Add("SourceEncoding", typeof(string));
dt.Columns.Add("TargetEncoding", typeof(string));
dt.Columns.Add("Result", typeof(string));
dt.Columns.Add("SourceEncodingName", typeof(string));
dt.Columns.Add("TargetEncodingName", typeof(string));
System.Text.EncodingInfo[] encs = System.Text.Encoding.GetEncodings();
for (int i = 0; i < encs.Length; ++i)
{
for (int j = 0; j < encs.Length; ++j)
{
System.Data.DataRow dr = dt.NewRow();
dr["SourceEncoding"] = encs[i].CodePage;
dr["TargetEncoding"] = encs[j].CodePage;
System.Text.Encoding enci = System.Text.Encoding.GetEncoding(encs[i].CodePage);
System.Text.Encoding encj = System.Text.Encoding.GetEncoding(encs[j].CodePage);
byte[] encoded = enci.GetBytes(broken);
string corrected = encj.GetString(encoded);
dr["Result"] = corrected;
dr["SourceEncodingName"] = enci.BodyName;
dr["TargetEncodingName"] = encj.BodyName;
if (StringComparer.InvariantCultureIgnoreCase.Equals(correct, corrected))
dt.Rows.Add(dr);
}
}
this.dataGridView1.DataSource = dt;
}
您可以下载结果here:
奇怪的是,看起来你可以从德国/ ANSI(或ISO-8859-1)获得ASCII,但没有办法将其转换回来(信息丢失)......
public static string lol()
{
string source = "Alu-Dreieckstütze";
// System.Text.Encoding encSource = System.Text.Encoding.Default;
System.Text.Encoding encSource = System.Text.Encoding.GetEncoding(28591);
System.Text.Encoding encTarget = System.Text.Encoding.ASCII;
byte[] encoded = encSource.GetBytes(source);
string broken = encTarget.GetString(encoded);
return broken;
}
有趣的是,由于旧版应用程序正确显示它,它不会丢失信息。
所以你确定你没有在Sqlite connectionString中输入错误(或没有)编码吗?
e.g。
"Data Source=C:\\Users\\USERNAME\\Desktop\\location.db; Version=3; UseUTF16Encoding=True;Synchronous=Normal;New=False"; // set up the connection string
https://www.sqlite.org/c3ref/c_any.html
您似乎可以使用pragma encoding
测试编码答案 1 :(得分:0)
2个步骤:
首先,您将数据库中的值读取为bytes数组
其次,将1252编码的bytes数组转换为字符串
这样的事情:
byte[] buffer = dataReader["colomnName"];
var encoding = Encoding.GetEncoding(28591);
string s = encoding.GetString(buffer);
答案 2 :(得分:0)
我也确实从错误编码字符串的源中导入数据。但是使用Microsoft.Data.SQLite库,注入用户定义的函数来修复编码非常容易。在该示例中,我还使用了Dapper:
using (var cnn = new SqliteConnection($"Data Source={databasePath}")) {
cnn.CreateFunction("fixencoding", (byte[] value) =>
Encoding.GetEncoding(1252).GetString(value), isDeterministic: true);
cnn.Open();
return cnn.Query<Board>(Properties.Resources.GetBoards);
}
对于此类:
public class Board
{
public string Code { get; set; }
public string Description { get; set }
public decimal Length { get; set; }
public decimal Width { get; set; }
public decimal Thickness { get; set; }
public int Quantity { get; set; }
}
和该查询(Properties.Resources.GetBoards
):
SELECT
fixencoding(CODE) AS Code,
fixencoding(DESC) AS Description,
LNGT AS Length,
WIDT AS Width,
THCK AS Thickness,
QNTY AS Quantity
FROM
BOARDS
如果源使用相同的系统区域设置,则可以仅使用Encoding.Default.GetString(value)
而不是Encoding.GetEncoding(1252).GetString(value)
。