如何检测文本的文化信息

时间:2017-07-03 11:28:10

标签: c#

我想创建一个游戏,用户在字符串数组中保存一些单词和含义。

see this image

然后程序为用户进行测验并显示单词,用户应该猜测他最近保存的含义。

see this image

我不知道保存含义的语言。我想检测含义字符并自动将测验文本框的输入语言更改为含义语言,这样用户就不必手动更改它。

例如: 第一次:这个词是

  

您好

,用户应键入

  

سلام(阿拉伯语)(输入语言应该是阿拉伯语)

第二次:这个词是

  

您好

和用户shoud类型

  

Tjena(瑞典语)(输入语言应该是瑞典语)

 Application.CurrentInputLanguage = 
        InputLanguage.FromCulture(new CultureInfo("Detected languages culturifo")); 

有一种通过Regex检测字符的方法:

if(Regex.IsMatch(theMeaningString, @"\p{IsArabic}")){
Application.CurrentInputLanguage = 
    InputLanguage.FromCulture(new CultureInfo("ar-EG")); 
}

但由于有很多键盘输入语言,因此很难检查所有这些语言。无论如何用Regex检测文化信息?

2 个答案:

答案 0 :(得分:2)

我必须这么做。所以我创建了一个方法,试图获取与字符串字符匹配的文化名称列表。由于某些语言共享字符,有时可能会出错列表。但这对我的目的来说已经足够了。然后我将列表与安装的键盘布局进行比较。如果列表中的区域性信息与安装的键盘布局匹配,则它是文本的区域性信息,因为键入它的用户使用已安装的键盘布局。 评论你是否有更好的主意!

   private CultureInfo GetMatchingCultureInfo(string theString)
    {

        List<string> arrayOfCultureNames = new List<string>();

        if (Regex.IsMatch(theString, @"\p{IsBasicLatin}"))
        {
            // Major alphabets:
            //English
            //French
            //Spanish
            //German
            //Swedish
            //Vietnamese 

            arrayOfCultureNames.Add("sv-FI"); //   Swedish - Finland    0x081D  SVF
            arrayOfCultureNames.Add("sv-SE"); //   Swedish - Sweden 0x041D 
            arrayOfCultureNames.Add("fr-BE");
            arrayOfCultureNames.Add("fr-CA");
            arrayOfCultureNames.Add("fr-FR");
            arrayOfCultureNames.Add("fr-LU");
            arrayOfCultureNames.Add("fr-MC");
            arrayOfCultureNames.Add("fr-CH");
            arrayOfCultureNames.Add("de-AT");
            arrayOfCultureNames.Add("de-DE");
            arrayOfCultureNames.Add("de-LI");
            arrayOfCultureNames.Add("de-LU");
            arrayOfCultureNames.Add("de-CH");
            arrayOfCultureNames.Add("es-AR"); //   Spanish - Argentina  0x2C0A  ESS
            arrayOfCultureNames.Add("es-BO"); //   Spanish - Bolivia    0x400A  ESB
            arrayOfCultureNames.Add("es-CL"); //   Spanish - Chile  0x340A  ESL
            arrayOfCultureNames.Add("es-CO"); //   Spanish - Colombia   0x240A  ESO
            arrayOfCultureNames.Add("es-CR"); //   Spanish - Costa Rica 0x140A  ESC
            arrayOfCultureNames.Add("es-DO"); //   Spanish - Dominican Republic 0x1C0A  ESD
            arrayOfCultureNames.Add("es-EC"); //   Spanish - Ecuador    0x300A  ESF
            arrayOfCultureNames.Add("es-SV"); //   Spanish - El Salvador    0x440A  ESE
            arrayOfCultureNames.Add("es-GT"); //   Spanish - Guatemala  0x100A  ESG
            arrayOfCultureNames.Add("es-HN"); //   Spanish - Honduras   0x480A  ESH
            arrayOfCultureNames.Add("es-MX"); //   Spanish - Mexico 0x080A  ESM
            arrayOfCultureNames.Add("es-NI"); //   Spanish - Nicaragua  0x4C0A  ESI
            arrayOfCultureNames.Add("es-PA"); //   Spanish - Panama 0x180A  ESA
            arrayOfCultureNames.Add("es-PY"); //   Spanish - Paraguay   0x3C0A  ESZ
            arrayOfCultureNames.Add("es-PE"); //   Spanish - Peru   0x280A  ESR
            arrayOfCultureNames.Add("es-PR"); //   Spanish - Puerto Rico    0x500A  ES
            arrayOfCultureNames.Add("es-ES"); //   Spanish - Spain  0x0C0A
            arrayOfCultureNames.Add("es-UY"); //   Spanish - Uruguay    0x380A  ESY
            arrayOfCultureNames.Add("es-VE"); //   Spanish - Venezuela  0x200A  ESV  
            arrayOfCultureNames.Add("vi-VN");
            arrayOfCultureNames.Add("nl-BE");
            arrayOfCultureNames.Add("en-AU");
            arrayOfCultureNames.Add("en-CA");
            arrayOfCultureNames.Add("en-CB");
            arrayOfCultureNames.Add("en-IE");
            arrayOfCultureNames.Add("en-JM");
            arrayOfCultureNames.Add("en-NZ");
            arrayOfCultureNames.Add("en-PH");
            arrayOfCultureNames.Add("en-ZA");
            arrayOfCultureNames.Add("en-TT");
            arrayOfCultureNames.Add("en-GB");
            arrayOfCultureNames.Add("en-US");
            arrayOfCultureNames.Add("en-ZW"); 
        }

        else if (Regex.IsMatch(theString, @"\p{IsLatinExtended-A}"))
        {
            // Major alphabets:
            //Latin
            //Czech
            //Dutch
            //Polish
            //Turkish
            //Lithuanian
            //Latvian   
            arrayOfCultureNames.Add("cs-CZ"); //   Czech - Czech Republic    0x0405  CSY  
            arrayOfCultureNames.Add("nl-BE"); //   Dutch
            arrayOfCultureNames.Add("hu-HU"); //   Hungarian - Hungary  0x040E  HUN  
            arrayOfCultureNames.Add("lv-LV"); //   Latvian - Latvia 0x0426  LVI
            arrayOfCultureNames.Add("lt-LT"); //   Lithuanian - Lithuania   0x0427  LTH 
            arrayOfCultureNames.Add("pl-PL"); //   Polish - Poland  0x0415  PLK    
            arrayOfCultureNames.Add("tr-TR"); //   Turkish - Turkey 0x041F  TRK  
        }

        else if (Regex.IsMatch(theString, @"\p{IsLatinExtended-B}"))
        {
            // Major alphabets:
            // Africa alphabet
            //Pan - Nigerian
            //Americanist
            //Khoisan
            //Pinyin
            //Romanian
            arrayOfCultureNames.Add("af-ZA");    //  Afrikaans - South Africa   0x0436  AFK    
            arrayOfCultureNames.Add("ro-RO"); //   Romanian - Romania   0x0418  ROM  
        }

        else if (Regex.IsMatch(theString, @"\p{IsGreek}") || Regex.IsMatch(theString, @"\p{IsGreekandCoptic}"))
        {
            // Major alphabets:
            //Greek 
            arrayOfCultureNames.Add("el-GR");  
        }

        else if (Regex.IsMatch(theString, @"\p{IsCyrillic}"))
        {
            // Major alphabets: 
            // Belarus
            // Bosnia and Herzegovina(mostly in Serb inhabited parts of the country))
            //Bulgaria
            //Kazakhstan
            //Kyrgyzstan
            //Macedonia
            //Mongolia(also Mongolian Script)
            //Montenegro(also Latin)
            //Russia
            //Serbia(also Latin)
            //Tajikistan
            //Ukraine   
            arrayOfCultureNames.Add("be-BY"); //   Belarusian - Belarus  0x0423  BEL
            arrayOfCultureNames.Add("bg-BG"); //   Bulgarian - Bulgaria  0x0402  BGR   
            arrayOfCultureNames.Add("kk-KZ"); //   Kazakh - Kazakhstan  0x043F 
            arrayOfCultureNames.Add("ky-KZ"); //   Kyrgyz - Kazakhstan  0x0440 
            arrayOfCultureNames.Add("mk-MK"); //   Macedonian (FYROM)   0x042F  MKD  
            arrayOfCultureNames.Add("mn-MN"); //   Mongolian - Mongolia 0x0450  
            arrayOfCultureNames.Add("ru-RU"); //   Russian - Russia 0x0419  RUS 
            arrayOfCultureNames.Add("Cy-sr-SP"); //   Serbian (Cyrillic) - Serbia   0x0C1A
            arrayOfCultureNames.Add("Lt-sr-SP"); //   Serbian (Latin) - Serbia  0x081A    
            arrayOfCultureNames.Add("tt-RU"); //   Tatar - Russia   0x0444 
            arrayOfCultureNames.Add("uk-UA"); //   Ukrainian - Ukraine  0x0422  UKR  
        }

        else if (Regex.IsMatch(theString, @"\p{IsArmenian}"))
        {
            // Major alphabets:
            //Armenian 
            arrayOfCultureNames.Add("hy-AM"); //   Armenian - Armenia    0x042B   
        }

        else if (Regex.IsMatch(theString, @"\p{IsHebrew}"))
        {
            // Major alphabets:
            //Hebrew   
            arrayOfCultureNames.Add("he-IL"); //   Hebrew - Israel  0x040D  HEB   
        }

        else if (Regex.IsMatch(theString, @"\p{IsArabic}"))
        {
            // Major alphabets:
            //Arabic
            //Persian 

            arrayOfCultureNames.Add("fa-IR");
            arrayOfCultureNames.Add("ar-DZ"); //  Arabic - Algeria  0x1401  ARG
            arrayOfCultureNames.Add("ar-BH"); //   Arabic - Bahrain  0x3C01  ARH
            arrayOfCultureNames.Add("ar-EG"); //   Arabic - Egypt    0x0C01  ARE
            arrayOfCultureNames.Add("ar-IQ"); //   Arabic - Iraq 0x0801  ARI
            arrayOfCultureNames.Add("ar-JO"); //   Arabic - Jordan   0x2C01  ARJ
            arrayOfCultureNames.Add("ar-KW"); //   Arabic - Kuwait   0x3401  ARK
            arrayOfCultureNames.Add("ar-LB"); //   Arabic - Lebanon  0x3001  ARB
            arrayOfCultureNames.Add("ar-LY"); //   Arabic - Libya    0x1001  ARL
            arrayOfCultureNames.Add("ar-MA"); //   Arabic - Morocco  0x1801  ARM
            arrayOfCultureNames.Add("ar-OM"); //   Arabic - Oman 0x2001  ARO
            arrayOfCultureNames.Add("ar-QA"); //   Arabic - Qatar    0x4001  ARQ
            arrayOfCultureNames.Add("ar-SA"); //   Arabic - Saudi Arabia 0x0401  ARA
            arrayOfCultureNames.Add("ar-SY"); //   Arabic - Syria    0x2801  ARS
            arrayOfCultureNames.Add("ar-TN"); //   Arabic - Tunisia  0x1C01  ART
            arrayOfCultureNames.Add("ar-AE"); //   Arabic - United Arab Emirates 0x3801  ARU
            arrayOfCultureNames.Add("ar-YE"); //   Arabic - Yemen    0x2401  ARY 
        }

        else if (Regex.IsMatch(theString, @"\p{IsSyriac}"))
        {
            // Major alphabets:
            //Syriac 
            arrayOfCultureNames.Add("syr-SY"); //   Syriac - Syria  0x045A 
        }

        else if (Regex.IsMatch(theString, @"\p{IsDevanagari}"))
        {
            // Major alphabets:
            //India and Nepal 

            arrayOfCultureNames.Add("gu-IN"); 
            arrayOfCultureNames.Add("hi-IN"); //   Hindi - India    0x0439  HIN 
            arrayOfCultureNames.Add("kn-IN"); //   Kannada - India  0x044B 
            arrayOfCultureNames.Add("kok-IN"); //   Konkani - India 0x0457 
            arrayOfCultureNames.Add("mr-IN"); //   Marathi - India  0x044E 
            arrayOfCultureNames.Add("pa-IN"); //   Punjabi - India  0x0446 
            arrayOfCultureNames.Add("sa-IN"); //   Sanskrit - India 0x044F  
            arrayOfCultureNames.Add("ta-IN"); //   Tamil - India    0x0449 
            arrayOfCultureNames.Add("te-IN"); //   Telugu - India   0x044A 
        }

        else 
        {
            // Major alphabets:   
            arrayOfCultureNames.Add("Cy-az-AZ"); //   Azeri(Cyrillic) - Azerbaijan  0x082C
            arrayOfCultureNames.Add("Lt-az-AZ");  //  Azeri(Latin) - Azerbaijan 0x042C
            arrayOfCultureNames.Add("eu-ES"); //   Basque - Basque   0x042D  EUQ
            arrayOfCultureNames.Add("be-BY"); //   Belarusian - Belarus  0x0423  BEL
            arrayOfCultureNames.Add("bg-BG"); //   Bulgarian - Bulgaria  0x0402  BGR
            arrayOfCultureNames.Add("ca-ES"); //   Catalan - Catalan 0x0403  CAT
            arrayOfCultureNames.Add("zh-CN"); //   Chinese - China   0x0804  CHS
            arrayOfCultureNames.Add("zh-HK"); //   Chinese - Hong Kong SAR   0x0C04  ZHH
            arrayOfCultureNames.Add("zh-MO"); //   Chinese - Macau SAR   0x1404
            arrayOfCultureNames.Add("zh-SG"); //   Chinese - Singapore   0x1004  ZHI
            arrayOfCultureNames.Add("zh-TW"); //   Chinese - Taiwan  0x0404  CHT
            arrayOfCultureNames.Add("zh-CHS"); //   Chinese(Simplified) 0x0004
            arrayOfCultureNames.Add("zh-CHT"); //   Chinese(Traditional)    0x7C04
            arrayOfCultureNames.Add("hr-HR"); //   Croatian - Croatia    0x041A  HRV
            arrayOfCultureNames.Add("cs-CZ"); //   Czech - Czech Republic    0x0405  CSY
            arrayOfCultureNames.Add("da-DK"); //   Danish - Denmark  0x0406  DAN
            arrayOfCultureNames.Add("div-MV"); //   Dhivehi - Maldives   0x0465 
            arrayOfCultureNames.Add("he-IL"); //   Hebrew - Israel  0x040D  HEB
            arrayOfCultureNames.Add("hi-IN"); //   Hindi - India    0x0439  HIN
            arrayOfCultureNames.Add("hu-HU"); //   Hungarian - Hungary  0x040E  HUN
            arrayOfCultureNames.Add("is-IS");  //  Icelandic - Iceland  0x040F  ISL
            arrayOfCultureNames.Add("id-ID"); //   Indonesian - Indonesia   0x0421
            arrayOfCultureNames.Add("it-IT"); //   Italian - Italy  0x0410
            arrayOfCultureNames.Add("it-CH"); //   Italian - Switzerland    0x0810  ITS
            arrayOfCultureNames.Add("ja-JP"); //   Japanese - Japan 0x0411  JPN
            arrayOfCultureNames.Add("kn-IN"); //   Kannada - India  0x044B
            arrayOfCultureNames.Add("kk-KZ"); //   Kazakh - Kazakhstan  0x043F
            arrayOfCultureNames.Add("kok-IN"); //   Konkani - India 0x0457
            arrayOfCultureNames.Add("ko-KR"); //   Korean - Korea   0x0412  KOR
            arrayOfCultureNames.Add("ky-KZ"); //   Kyrgyz - Kazakhstan  0x0440
            arrayOfCultureNames.Add("lv-LV"); //   Latvian - Latvia 0x0426  LVI
            arrayOfCultureNames.Add("lt-LT"); //   Lithuanian - Lithuania   0x0427  LTH
            arrayOfCultureNames.Add("mk-MK"); //   Macedonian (FYROM)   0x042F  MKD
            arrayOfCultureNames.Add("ms-BN"); //   Malay - Brunei   0x083E
            arrayOfCultureNames.Add("ms-MY"); //   Malay - Malaysia 0x043E
            arrayOfCultureNames.Add("mr-IN"); //   Marathi - India  0x044E
            arrayOfCultureNames.Add("mn-MN"); //   Mongolian - Mongolia 0x0450
            arrayOfCultureNames.Add("nb-NO"); //   Norwegian (Bokmål) - Norway  0x0414
            arrayOfCultureNames.Add("nn-NO"); //   Norwegian (Nynorsk) - Norway 0x0814
            arrayOfCultureNames.Add("pl-PL"); //   Polish - Poland  0x0415  PLK
            arrayOfCultureNames.Add("pt-BR"); //   Portuguese - Brazil  0x0416  PTB
            arrayOfCultureNames.Add("pt-PT"); //   Portuguese - Portugal    0x0816
            arrayOfCultureNames.Add("pa-IN"); //   Punjabi - India  0x0446
            arrayOfCultureNames.Add("ro-RO"); //   Romanian - Romania   0x0418  ROM
            arrayOfCultureNames.Add("ru-RU"); //   Russian - Russia 0x0419  RUS
            arrayOfCultureNames.Add("sa-IN"); //   Sanskrit - India 0x044F
            arrayOfCultureNames.Add("Cy-sr-SP"); //   Serbian (Cyrillic) - Serbia   0x0C1A
            arrayOfCultureNames.Add("Lt-sr-SP"); //   Serbian (Latin) - Serbia  0x081A
            arrayOfCultureNames.Add("sk-SK"); //   Slovak - Slovakia    0x041B  SKY
            arrayOfCultureNames.Add("sl-SI"); //   Slovenian - Slovenia 0x0424  SLV
            arrayOfCultureNames.Add("es-AR"); //   Spanish - Argentina  0x2C0A  ESS
            arrayOfCultureNames.Add("es-BO"); //   Spanish - Bolivia    0x400A  ESB
            arrayOfCultureNames.Add("es-CL"); //   Spanish - Chile  0x340A  ESL
            arrayOfCultureNames.Add("es-CO"); //   Spanish - Colombia   0x240A  ESO
            arrayOfCultureNames.Add("es-CR"); //   Spanish - Costa Rica 0x140A  ESC
            arrayOfCultureNames.Add("es-DO"); //   Spanish - Dominican Republic 0x1C0A  ESD
            arrayOfCultureNames.Add("es-EC"); //   Spanish - Ecuador    0x300A  ESF
            arrayOfCultureNames.Add("es-SV"); //   Spanish - El Salvador    0x440A  ESE
            arrayOfCultureNames.Add("es-GT"); //   Spanish - Guatemala  0x100A  ESG
            arrayOfCultureNames.Add("es-HN"); //   Spanish - Honduras   0x480A  ESH
            arrayOfCultureNames.Add("es-MX"); //   Spanish - Mexico 0x080A  ESM
            arrayOfCultureNames.Add("es-NI"); //   Spanish - Nicaragua  0x4C0A  ESI
            arrayOfCultureNames.Add("es-PA"); //   Spanish - Panama 0x180A  ESA
            arrayOfCultureNames.Add("es-PY"); //   Spanish - Paraguay   0x3C0A  ESZ
            arrayOfCultureNames.Add("es-PE"); //   Spanish - Peru   0x280A  ESR
            arrayOfCultureNames.Add("es-PR"); //   Spanish - Puerto Rico    0x500A  ES
            arrayOfCultureNames.Add("es-ES"); //   Spanish - Spain  0x0C0A
            arrayOfCultureNames.Add("es-UY"); //   Spanish - Uruguay    0x380A  ESY
            arrayOfCultureNames.Add("es-VE"); //   Spanish - Venezuela  0x200A  ESV
            arrayOfCultureNames.Add("sw-KE");  //   Swahili - Kenya 0x0441
            arrayOfCultureNames.Add("syr-SY"); //   Syriac - Syria  0x045A
            arrayOfCultureNames.Add("ta-IN"); //   Tamil - India    0x0449
            arrayOfCultureNames.Add("tt-RU"); //   Tatar - Russia   0x0444
            arrayOfCultureNames.Add("te-IN"); //   Telugu - India   0x044A
            arrayOfCultureNames.Add("th-TH"); //   Thai - Thailand  0x041E  THA
            arrayOfCultureNames.Add("tr-TR"); //   Turkish - Turkey 0x041F  TRK
            arrayOfCultureNames.Add("uk-UA"); //   Ukrainian - Ukraine  0x0422  UKR
            arrayOfCultureNames.Add("ur-PK");  //  Urdu - Pakistan  0x0420  URD
            arrayOfCultureNames.Add("Cy-uz-UZ");
            arrayOfCultureNames.Add("Lt-uz-UZ"); 
        }


        // Get installed keyboard layouts and compare them with our culture list. If there is an installed keyboard layout for 
        // one of the culture names in our list then return that culture info.
        CultureInfo[] InstalledKeyboardLayouts = CultureInfo.GetCultures(CultureTypes.InstalledWin32Cultures);
        InputLanguage il;
        string cultureName;

        for (int i = 0; i < arrayOfCultureNames.Count; i++)
        {
            foreach (CultureInfo ci in InstalledKeyboardLayouts)
            {
                il = InputLanguage.FromCulture(ci);
                if (il != null)
                {
                     cultureName = il.Culture.ToString();

                     if (arrayOfCultureNames[i].Contains(cultureName))
                    {
                        return new CultureInfo(cultureName);
                     }
                }

            }

        }

        //If the culture info is not detected then return the first culture info (the first installed keyboard layout)

        il = InputLanguage.FromCulture(InstalledKeyboardLayouts[0]);
        if (il != null)
        {
            return (new CultureInfo(il.Culture.ToString()));
        }

        //If non of above works then return null
        return null;
    }

然后,当我得到文化信息时,我可以更改键盘布局:

CultureInfo cf = GetMatchingCultureInfo(theMeaningString);
            if (cf != null)
            {
                System.Windows.Forms.Application.CurrentInputLanguage = InputLanguage.FromCulture(cf);
            }

答案 1 :(得分:1)

文本文件不保存任何文化信息。您似乎正在混淆编码语言

如果您需要在特定文本文件中保存语言信息,请考虑使用标题您的应用程序理解指定以下所有内容的语言:

[English] //specifies the language is english
....

另一个文件是:

[Swedish]
.....