给出编码的名称,如何获得相应的codePage identifier?
例如:
28591
1252
500
1200
65001
假想的用例: Windows函数MultiByteToWideChar仅使用CodePage
,而我只有encodingName。
EnumSystemCodePages返回一个字符串列表,而不是代码页标识符(因此您不能将它们传递给 GetCPInfo )。
答案 0 :(得分:2)
您的要求没有Win32 API。
如果可以使用.NET,则可以使用System.Text.Encoding
方法从编码名称创建Encoding.GetEncoding(String)
类的对象实例,然后可以读取其CodePage
属性。
否则,您可以查看以下注册表项:
HKEY_CLASSES_ROOT\MIME\Database\Codepage
HKEY_CLASSES_ROOT\Mime\Database\Charset
但是,请注意注册表中存在一些错误。例如,iso-8859-1
映射到代码页1252,而不是首选28591。
否则,您只需要根据需要在代码中创建自己的查找表即可。
答案 1 :(得分:1)
IMultiLanguage::GetCharsetInfo
会将“ shift_jis”和“ gb2312”之类的字符串转换为代码页932和936,但是它不知道所有名称,因此您还必须提供自己的查找表。我怀疑它使用与其他答案之一相同的注册表项。
| Charset | CodePage |
|----------------|-------------------|
| "windows-1252" | 1252 |
| "utf-8" | 1200 |
| "utf-16" | Unspecified error |
| "iso-8859-1" | 1252 (wrong) |
| "ibm850" | Unspecified error |
答案 2 :(得分:0)
由于Windows中无法执行此操作,因此必须创建自己的版本(总是错误,总是不完整):
function EncodingNameToCodePage(EncodingName: string): Cardinal;
const
cp: array[0..151] of record CodePage: Cardinal; Name: string end = //Thank you John Herbster
(
(CodePage: 037; Name: 'IBM037'), //IBM EBCDIC US-Canada
(CodePage: 437; Name: 'IBM437'), //OEM United States
(CodePage: 500; Name: 'IBM500'), //IBM EBCDIC International
(CodePage: 708; Name: 'ASMO-708'), //Arabic (ASMO 708)
(CodePage: 709; Name: ''), //Arabic (ASMO-449+, BCON V4)
(CodePage: 710; Name: ''), //Arabic - Transparent Arabic
(CodePage: 720; Name: 'DOS-720'), //Arabic (Transparent ASMO); Arabic (DOS)
(CodePage: 737; Name: 'ibm737'), //OEM Greek (formerly 437G); Greek (DOS)
(CodePage: 775; Name: 'ibm775'), //OEM Baltic; Baltic (DOS)
(CodePage: 850; Name: 'ibm850'), //OEM Multilingual Latin 1; Western European (DOS)
(CodePage: 852; Name: 'ibm852'), //OEM Latin 2; Central European (DOS)
(CodePage: 855; Name: 'IBM855'), //OEM Cyrillic (primarily Russian)
(CodePage: 857; Name: 'ibm857'), //OEM Turkish; Turkish (DOS)
(CodePage: 858; Name: 'IBM00858'), //OEM Multilingual Latin 1 + Euro symbol
(CodePage: 860; Name: 'IBM860'), //OEM Portuguese; Portuguese (DOS)
(CodePage: 861; Name: 'ibm861'), //OEM Icelandic; Icelandic (DOS)
(CodePage: 862; Name: 'DOS-862'), //OEM Hebrew; Hebrew (DOS)
(CodePage: 863; Name: 'IBM863'), //OEM French Canadian; French Canadian (DOS)
(CodePage: 864; Name: 'IBM864'), //OEM Arabic; Arabic (864)
(CodePage: 865; Name: 'IBM865'), //OEM Nordic; Nordic (DOS)
(CodePage: 866; Name: 'cp866'), //OEM Russian; Cyrillic (DOS)
(CodePage: 869; Name: 'ibm869'), //OEM Modern Greek; Greek, Modern (DOS)
(CodePage: 870; Name: 'IBM870'), //IBM EBCDIC Multilingual/ROECE (Latin 2); IBM EBCDIC Multilingual Latin 2
(CodePage: 874; Name: 'windows-874'), //ANSI/OEM Thai (ISO 8859-11); Thai (Windows)
(CodePage: 875; Name: 'cp875'), //IBM EBCDIC Greek Modern
(CodePage: 932; Name: 'shift_jis'), //ANSI/OEM Japanese; Japanese (Shift-JIS)
(CodePage: 936; Name: 'gb2312'), //ANSI/OEM Simplified Chinese (PRC, Singapore); Chinese Simplified (GB2312)
(CodePage: 949; Name: 'ks_c_5601-1987'), //ANSI/OEM Korean (Unified Hangul Code)
(CodePage: 950; Name: 'big5'), //ANSI/OEM Traditional Chinese (Taiwan; Hong Kong SAR, PRC); Chinese Traditional (Big5)
(CodePage: 1026; Name: 'IBM1026'), //IBM EBCDIC Turkish (Latin 5)
(CodePage: 1047; Name: 'IBM01047'), //IBM EBCDIC Latin 1/Open System
(CodePage: 1140; Name: 'IBM01140'), //IBM EBCDIC US-Canada (037 + Euro symbol); IBM EBCDIC (US-Canada-Euro)
(CodePage: 1141; Name: 'IBM01141'), //IBM EBCDIC Germany (20273 + Euro symbol); IBM EBCDIC (Germany-Euro)
(CodePage: 1142; Name: 'IBM01142'), //IBM EBCDIC Denmark-Norway (20277 + Euro symbol); IBM EBCDIC (Denmark-Norway-Euro)
(CodePage: 1143; Name: 'IBM01143'), //IBM EBCDIC Finland-Sweden (20278 + Euro symbol); IBM EBCDIC (Finland-Sweden-Euro)
(CodePage: 1144; Name: 'IBM01144'), //IBM EBCDIC Italy (20280 + Euro symbol); IBM EBCDIC (Italy-Euro)
(CodePage: 1145; Name: 'IBM01145'), //IBM EBCDIC Latin America-Spain (20284 + Euro symbol); IBM EBCDIC (Spain-Euro)
(CodePage: 1146; Name: 'IBM01146'), //IBM EBCDIC United Kingdom (20285 + Euro symbol); IBM EBCDIC (UK-Euro)
(CodePage: 1147; Name: 'IBM01147'), //IBM EBCDIC France (20297 + Euro symbol); IBM EBCDIC (France-Euro)
(CodePage: 1148; Name: 'IBM01148'), //IBM EBCDIC International (500 + Euro symbol); IBM EBCDIC (International-Euro)
(CodePage: 1149; Name: 'IBM01149'), //IBM EBCDIC Icelandic (20871 + Euro symbol); IBM EBCDIC (Icelandic-Euro)
(CodePage: 1200; Name: 'utf-16'), //Unicode UTF-16, little endian byte order (BMP of ISO 10646); available only to managed applications
(CodePage: 1201; Name: 'unicodeFFFE'), //Unicode UTF-16, big endian byte order; available only to managed applications
(CodePage: 1250; Name: 'windows-1250'), //ANSI Central European; Central European (Windows)
(CodePage: 1251; Name: 'windows-1251'), //ANSI Cyrillic; Cyrillic (Windows)
(CodePage: 1252; Name: 'windows-1252'), //ANSI Latin 1; Western European (Windows)
(CodePage: 1253; Name: 'windows-1253'), //ANSI Greek; Greek (Windows)
(CodePage: 1254; Name: 'windows-1254'), //ANSI Turkish; Turkish (Windows)
(CodePage: 1255; Name: 'windows-1255'), //ANSI Hebrew; Hebrew (Windows)
(CodePage: 1256; Name: 'windows-1256'), //ANSI Arabic; Arabic (Windows)
(CodePage: 1257; Name: 'windows-1257'), //ANSI Baltic; Baltic (Windows)
(CodePage: 1258; Name: 'windows-1258'), //ANSI/OEM Vietnamese; Vietnamese (Windows)
(CodePage: 1361; Name: 'Johab'), //Korean (Johab)
(CodePage: 10000; Name: 'macintosh'), //MAC Roman; Western European (Mac)
(CodePage: 10001; Name: 'x-mac-japanese'), //Japanese (Mac)
(CodePage: 10002; Name: 'x-mac-chinesetrad'), //MAC Traditional Chinese (Big5); Chinese Traditional (Mac)
(CodePage: 10003; Name: 'x-mac-korean'), //Korean (Mac)
(CodePage: 10004; Name: 'x-mac-arabic'), //Arabic (Mac)
(CodePage: 10005; Name: 'x-mac-hebrew'), //Hebrew (Mac)
(CodePage: 10006; Name: 'x-mac-greek'), //Greek (Mac)
(CodePage: 10007; Name: 'x-mac-cyrillic'), //Cyrillic (Mac)
(CodePage: 10008; Name: 'x-mac-chinesesimp'), //MAC Simplified Chinese (GB 2312); Chinese Simplified (Mac)
(CodePage: 10010; Name: 'x-mac-romanian'), //Romanian (Mac)
(CodePage: 10017; Name: 'x-mac-ukrainian'), //Ukrainian (Mac)
(CodePage: 10021; Name: 'x-mac-thai'), //Thai (Mac)
(CodePage: 10029; Name: 'x-mac-ce'), //MAC Latin 2; Central European (Mac)
(CodePage: 10079; Name: 'x-mac-icelandic'), //Icelandic (Mac)
(CodePage: 10081; Name: 'x-mac-turkish'), //Turkish (Mac)
(CodePage: 10082; Name: 'x-mac-croatian'), //Croatian (Mac)
(CodePage: 12000; Name: 'utf-32'), //Unicode UTF-32, little endian byte order; available only to managed applications
(CodePage: 12001; Name: 'utf-32BE'), //Unicode UTF-32, big endian byte order; available only to managed applications
(CodePage: 20000; Name: 'x-Chinese_CNS'), //CNS Taiwan; Chinese Traditional (CNS)
(CodePage: 20001; Name: 'x-cp20001'), //TCA Taiwan
(CodePage: 20002; Name: 'x_Chinese-Eten'), //Eten Taiwan; Chinese Traditional (Eten)
(CodePage: 20003; Name: 'x-cp20003'), //IBM5550 Taiwan
(CodePage: 20004; Name: 'x-cp20004'), //TeleText Taiwan
(CodePage: 20005; Name: 'x-cp20005'), //Wang Taiwan
(CodePage: 20105; Name: 'x-IA5'), //IA5 (IRV International Alphabet No. 5, 7-bit); Western European (IA5)
(CodePage: 20106; Name: 'x-IA5-German'), //IA5 German (7-bit)
(CodePage: 20107; Name: 'x-IA5-Swedish'), //IA5 Swedish (7-bit)
(CodePage: 20108; Name: 'x-IA5-Norwegian'), //IA5 Norwegian (7-bit)
(CodePage: 20127; Name: 'us-ascii'), //US-ASCII (7-bit)
(CodePage: 20261; Name: 'x-cp20261'), //T.61
(CodePage: 20269; Name: 'x-cp20269'), //ISO 6937 Non-Spacing Accent
(CodePage: 20273; Name: 'IBM273'), //IBM EBCDIC Germany
(CodePage: 20277; Name: 'IBM277'), //IBM EBCDIC Denmark-Norway
(CodePage: 20278; Name: 'IBM278'), //IBM EBCDIC Finland-Sweden
(CodePage: 20280; Name: 'IBM280'), //IBM EBCDIC Italy
(CodePage: 20284; Name: 'IBM284'), //IBM EBCDIC Latin America-Spain
(CodePage: 20285; Name: 'IBM285'), //IBM EBCDIC United Kingdom
(CodePage: 20290; Name: 'IBM290'), //IBM EBCDIC Japanese Katakana Extended
(CodePage: 20297; Name: 'IBM297'), //IBM EBCDIC France
(CodePage: 20420; Name: 'IBM420'), //IBM EBCDIC Arabic
(CodePage: 20423; Name: 'IBM423'), //IBM EBCDIC Greek
(CodePage: 20424; Name: 'IBM424'), //IBM EBCDIC Hebrew
(CodePage: 20833; Name: 'x-EBCDIC-KoreanExtended'), //IBM EBCDIC Korean Extended
(CodePage: 20838; Name: 'IBM-Thai'), //IBM EBCDIC Thai
(CodePage: 20866; Name: 'koi8-r'), //Russian (KOI8-R); Cyrillic (KOI8-R)
(CodePage: 20871; Name: 'IBM871'), //IBM EBCDIC Icelandic
(CodePage: 20880; Name: 'IBM880'), //IBM EBCDIC Cyrillic Russian
(CodePage: 20905; Name: 'IBM905'), //IBM EBCDIC Turkish
(CodePage: 20924; Name: 'IBM00924'), //IBM EBCDIC Latin 1/Open System (1047 + Euro symbol)
(CodePage: 20932; Name: 'EUC-JP'), //Japanese (JIS 0208-1990 and 0212-1990)
(CodePage: 20936; Name: 'x-cp20936'), //Simplified Chinese (GB2312); Chinese Simplified (GB2312-80)
(CodePage: 20949; Name: 'x-cp20949'), //Korean Wansung
(CodePage: 21025; Name: 'cp1025'), //IBM EBCDIC Cyrillic Serbian-Bulgarian
(CodePage: 21027; Name: ''), //(deprecated)
(CodePage: 21866; Name: 'koi8-u'), //Ukrainian (KOI8-U); Cyrillic (KOI8-U)
(CodePage: 28591; Name: 'iso-8859-1'), //ISO 8859-1 Latin 1; Western European (ISO)
(CodePage: 28592; Name: 'iso-8859-2'), //ISO 8859-2 Central European; Central European (ISO)
(CodePage: 28593; Name: 'iso-8859-3'), //ISO 8859-3 Latin 3
(CodePage: 28594; Name: 'iso-8859-4'), //ISO 8859-4 Baltic
(CodePage: 28595; Name: 'iso-8859-5'), //ISO 8859-5 Cyrillic
(CodePage: 28596; Name: 'iso-8859-6'), //ISO 8859-6 Arabic
(CodePage: 28597; Name: 'iso-8859-7'), //ISO 8859-7 Greek
(CodePage: 28598; Name: 'iso-8859-8'), //ISO 8859-8 Hebrew; Hebrew (ISO-Visual)
(CodePage: 28599; Name: 'iso-8859-9'), //ISO 8859-9 Turkish
(CodePage: 28603; Name: 'iso-8859-13'), //ISO 8859-13 Estonian
(CodePage: 28605; Name: 'iso-8859-15'), //ISO 8859-15 Latin 9
(CodePage: 29001; Name: 'x-Europa'), //Europa 3
(CodePage: 38598; Name: 'iso-8859-8-i'), //ISO 8859-8 Hebrew; Hebrew (ISO-Logical)
(CodePage: 50220; Name: 'iso-2022-jp'), //ISO 2022 Japanese with no halfwidth Katakana; Japanese (JIS)
(CodePage: 50221; Name: 'csISO2022JP'), //ISO 2022 Japanese with halfwidth Katakana; Japanese (JIS-Allow 1 byte Kana)
(CodePage: 50222; Name: 'iso-2022-jp'), //ISO 2022 Japanese JIS X 0201-1989; Japanese (JIS-Allow 1 byte Kana - SO/SI)
(CodePage: 50225; Name: 'iso-2022-kr'), //ISO 2022 Korean
(CodePage: 50227; Name: 'x-cp50227'), //ISO 2022 Simplified Chinese; Chinese Simplified (ISO 2022)
(CodePage: 50229; Name: ''), //ISO 2022 Traditional Chinese
(CodePage: 50930; Name: ''), //EBCDIC Japanese (Katakana) Extended
(CodePage: 50931; Name: ''), //EBCDIC US-Canada and Japanese
(CodePage: 50933; Name: ''), //EBCDIC Korean Extended and Korean
(CodePage: 50935; Name: ''), //EBCDIC Simplified Chinese Extended and Simplified Chinese
(CodePage: 50936; Name: ''), //EBCDIC Simplified Chinese
(CodePage: 50937; Name: ''), //EBCDIC US-Canada and Traditional Chinese
(CodePage: 50939; Name: ''), //EBCDIC Japanese (Latin) Extended and Japanese
(CodePage: 51932; Name: 'euc-jp'), //EUC Japanese
(CodePage: 51936; Name: 'EUC-CN'), //EUC Simplified Chinese; Chinese Simplified (EUC)
(CodePage: 51949; Name: 'euc-kr'), //EUC Korean
(CodePage: 51950; Name: ''), //EUC Traditional Chinese
(CodePage: 52936; Name: 'hz-gb-2312'), //HZ-GB2312 Simplified Chinese; Chinese Simplified (HZ)
(CodePage: 54936; Name: 'GB18030'), //Windows XP and later: GB18030 Simplified Chinese (4 byte); Chinese Simplified (GB18030)
(CodePage: 57002; Name: 'x-iscii-de'), //ISCII Devanagari
(CodePage: 57003; Name: 'x-iscii-be'), //ISCII Bangla
(CodePage: 57004; Name: 'x-iscii-ta'), //ISCII Tamil
(CodePage: 57005; Name: 'x-iscii-te'), //ISCII Telugu
(CodePage: 57006; Name: 'x-iscii-as'), //ISCII Assamese
(CodePage: 57007; Name: 'x-iscii-or'), //ISCII Odia
(CodePage: 57008; Name: 'x-iscii-ka'), //ISCII Kannada
(CodePage: 57009; Name: 'x-iscii-ma'), //ISCII Malayalam
(CodePage: 57010; Name: 'x-iscii-gu'), //ISCII Gujarati
(CodePage: 57011; Name: 'x-iscii-pa'), //ISCII Punjabi
(CodePage: 65000; Name: 'utf-7'), //Unicode (UTF-7)
(CodePage: 65001; Name: 'utf-8') //Unicode (UTF-8)
);
var
i: Integer;
begin
Result := 0;
if EncodingName = '' then
Exit;
for i := Low(cp) to High(cp) do
begin
if cp[i].Name = '' then
Continue;
if SameText(EncodingName, cp[i].Name) then
begin
Result := cp[i].CodePage;
Exit;
end;
end;
end;
答案 3 :(得分:-1)
ICU具有可用于此目的的功能,尽管它有点笨拙(如果您要完全使用ICU,则最好继续使用其转换工具,而不要使用它来获取代码页,然后使用MultiByteToWideChar)。