Question

对于一个小小的项目，我需要在Windows的CMD中输出可能已本地化的文本字符串，并从程序的参数中读取一些字符串。为了简化问题，我将使用一个简单的echo程序作为演示。

请考虑使用C语言的代码段：

#include <stdio.h>

int main(int argc, char **argv) {
    // Display the first argument through the standard output:
    if (argc > 1)
        puts(argv[1]);
    return 0;
}

这是两次执行的输出：

$ test.exe Wilhelm
$ Wilhelm

$ test.exe Röntgen
$ R÷ntgen

在那里你已经可以看到像ö这样的东西不能正确显示。但是他们在程序中被正确识别，例如，如果您执行以下操作：

if (argv[1][1] == 'ö')
    puts("It is.");

将显示该句子，因此程序正在正确接收字符。

所以我可以，但是，wchar_t可能需要这样做，所以进行适当的更改并定义UNICODE和_UNICODE：

#include <stdio.h>

int wmain(int argc, wchar_t **argv) {
    // Display the first argument through the standard output:
    if (argc > 1)
        _putws(argv[1]);
    return 0;
}

此测试程序的输出仍然是相同的。

环顾四周并阅读文档我发现了一些解决方法，例如将语言环境设置为英语：然后将正确显示文本。修改第一个版本（没有wchar_t）我最终得到了这个：

#include <stdio.h>
#include <locale.h>

int main(int argc, char **argv) {
    // Get the previous locale and change to English:
    char *old_locale = setlocale(LC_ALL, NULL);
    setlocale(LC_ALL, "English");
    // Display the first argument through the standard output:
    if (argc > 1)
        puts(argv[1]);
    // Restore locale:
    setlocale(LC_ALL, old_locale);
    return 0;
}

（"en-US"似乎在MinGW-w64中不起作用，而"English"使用它和Microsoft Visual C ++）

现在程序能够进行打印，以便在命令行窗口中实际正确显示该字符。

问题在于，将事物设置为英语不是西班牙语系统中最好的事情，例如日语系统。所以我想过以某种方式从系统中获取语言环境。我找到了一个名为_get_current_locale的函数，它返回_locale_t，但它似乎根本不是我想要的：

_locale_t_variable->locinfo->lc_category[LC_ALL].locale（char *}似乎是NULL。

所以问题是，如何在命令行的语言环境中获取或显示文本？在Windows的CMD（不一定是Unicode）中处理本地化文本的正确方法是什么？

Answer 1

＆＃34;这是两个输出......＆＃34;：如果您使用 cmd.exe ，为什么提示符类似于美元符号？你这样设置了吗？如果您确实使用 cmd.exe ，则可以查看＆＃34;代码页＆＃34;用：

mode con cp /status

如果您发现它 437 ，这可以解释您的意外观察。打开 charmap.exe ，你会发现你所关注的角色被称为＆＃34; U + 00F6拉丁文小写字母O和Diaresis＆＃34;。如果使用代码页437将其粘贴到CLI中，则会发生一些有趣的事情......

将传递给unicode程序的代码为： 0xF6 ， 0x00 您的程序将收到此代码。

该字符被识别为存在于代码页437中，但代码为 0x94 。我相信CLI（包括 echo 命令）执行一些WYSIWYG，后面的代码（0x94）会显示给你并输出到 stdout 。

如果您从CLI将字符复制到剪贴板，它将获得与＆＃34; OEM文本＆＃34;的附加关联。和0x94代码。

现在让我们切换到代码页 1252 ：

mode con cp select=1252

在此代码页中，当您从字符映射粘贴到CLI时，传递给unicode程序的代码将与上一个方案中的代码保持一致。

但是现在您观察到的字符是终端字体中的0xF6（视觉上类似于代码页437的字体），因此您有分割符号。 echo 命令会将相同的代码发送到 stdout 。

如果您从CLI将字符复制到剪贴板，它将获得与＆＃34; OEM文本＆＃34;的附加关联。和0x94代码，和以前一样。

如果您使用此字符将 echo 命令的输出重定向到文件并使用终端字体在记事本中打开文件，您将看到师迹象。如果您将字体更改为 Courier New ，您会看到＆＃34;带有diaresis的小o，＆＃34;按照Unicode。

现在切换回代码页 437 ：

mode con cp select=437

如果您希望Windows unicode程序将未翻译的Unicode序列输出到 FILE * ，我相信您必须使用二进制模式。要修改原始代码，您可能需要：

#define _UNICODE

#include <locale.h>
#include <stdio.h>
#include <stdlib.h>

#include <tchar.h>
#include <fcntl.h>
#include <io.h>

int __cdecl _tmain(int argc, TCHAR ** argv, TCHAR ** envp) {
    wchar_t bom = 0xFEFF;

    _setmode(_fileno(stdout), _O_BINARY);

    _ftprintf(stdout, _T("%c"), bom);
    _putts(argv[1]);

    return EXIT_SUCCESS;
  }

在此示例中，我们在编写 UTF-16之前编写 UTF-16LE字节顺序标记（＆＃34; BOM ＆＃34;） stdout 参数中的字符。这在CLI中看起来很难看，但如果你重定向到一个文件或直接使用一个文件（在二进制模式下），结果可能更符合你最初感兴趣的内容：

#define _UNICODE

#ifdef _UNICODE
#define BOM { 0xFF, 0xFE, 0, 0 }
#else
#define BOM { 0 }
#endif

#include <locale.h>
#include <stdio.h>
#include <stdlib.h>

#include <tchar.h>
#include <fcntl.h>
#include <io.h>

int __cdecl _tmain(int argc, TCHAR ** argv, TCHAR ** envp) {
    /* Initialize the BOM string */
    static const union {
        unsigned char bytes[sizeof (TCHAR) * 2];
        TCHAR c[2];
      } bom = BOM;
    FILE * f;
    TCHAR filename[] = _T("testfile.txt");
    int r;
    int rc;

    /* Assume failure */
    rc = EXIT_FAILURE;

    if (argc != 2) {
        _ftprintf(stderr, _T("Usage: %s <word>\n"), argv[0]);
        goto err_usage;
      }

    f = _tfopen(filename, _T("wb"));
    if (!f) {
        _ftprintf(stderr, _T("Could not open file: %s\n"), filename);
        goto err_fopen;
      }

    r = _ftprintf(f, _T("%s"), bom.c);
    if (r != _tcsclen(bom.c)) {
        _ftprintf(stderr, _T("Could not write BOM to file\n"));
        goto err_bom;
      }

    r = _ftprintf(f, _T("%s"), argv[1]);
    if (r !=  _tcsclen(argv[1])) {
        _ftprintf(stderr, _T("Could not write argument to file\n"));
        goto err_arg;
      }

    rc = EXIT_SUCCESS;

    err_arg:

    err_bom:

    fclose(f);
    err_fopen:

    err_usage:

    return rc;
  }

以下是一些可能有用的其他资源：

_tfopen ：http://msdn.microsoft.com/en-us/library/yeby3zcb.aspx

_ftprintf ：http://msdn.microsoft.com/en-us/library/xkh07fe2.aspx

_setmode ：http://msdn.microsoft.com/en-us/library/tw4k6df8.aspx

关于带文本和二进制流的Unicode：http://msdn.microsoft.com/en-us/library/c4cy2b8e.aspx

SBCS，MBCS，Unicode函数：http://msdn.microsoft.com/en-us/library/tsbaswba.aspx

如何在命令行的语言环境中显示文本？

1 个答案: