Question

我正试图了解我所看到的某些行为。

我有这个C ++程序：

// Outputter.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <iostream>


int main()
{
    // UTF-8 bytes for "日本語"
    std::cout << (char)0xE6 << (char)0x97 << (char)0xA5 << (char)0xE6 << (char)0x9C << (char)0xAC << (char)0xE8 << (char)0xAA << (char)0x9E;
    return 0;
}

如果我在Powershell中运行以下命令：

[System.Console]::OutputEncoding = [System.Console]::InputEncoding = [System.Text.Encoding]::UTF8
.\print_it.exe # This is the above program ^
日本語 # This is the output as displayed in Powershell

然后日本語在Powershell中被打印并正确显示。

但是，如果我将setlocale(LC_ALL, "English_United States.1252");添加到代码中，就像这样：

int main()
{
    setlocale(LC_ALL, "English_United States.1252");

    // UTF-8 bytes for "日本語"
    std::cout << (char)0xE6 << (char)0x97 << (char)0xA5 << (char)0xE6 << (char)0x9C << (char)0xAC << (char)0xE8 << (char)0xAA << (char)0x9E;
    return 0;
}

程序现在将垃圾打印到Powershell（确切地说是æ—¥æœ¬èªž，这是代码页1252对那些字节的错误解释）。

但是，如果我将输出通过管道传输到文件中，然后将其保存为文件，则看起来不错：

.\print_it.exe > out.txt
cat out.txt
日本語 # It displays fine, like this, if I redirect to a file and cat the file.

此外，无论我setlocale做什么，Git bash都能正确显示输出。

有人可以帮助我理解为什么即使将相同的字节写入stdout，setlocale仍会影响Powershell中输出的显示方式吗？看来Powershell能够以某种方式访问程序的语言环境并使用它来解释输出？

Powershell版本为5.1.17763.592。

Answer 1

所有与编码有关。使用>重定向获取正确字符的原因是由于>重定向默认情况下使用UTF-16LE。因此，您设置的编码1252将自动转换为UTF-16。

根据您的PowerShell版本，您可以或不能更改重定向的编码。

如果将Out-File与-Encoding开关一起使用，则可以更改目标文件的编码（同样取决于PowerShell版本）。

我建议阅读有关here主题的mklement0优秀文章。

根据评论进行编辑

取自cppreference

std :: setlocale   在标头<clocale>
中定义的C ++本地化库
char* setlocale( int category, const char* locale);

setlocale函数将指定的系统语言环境或其部分安装为新的C语言环境。修改仍然有效并影响   执行所有对语言环境敏感的C库函数，直到下一个   调用setlocale。如果locale是空指针，则setlocale查询   当前的C语言环境而无需对其进行修改。

您要发送到std::cout的字节是相同的，但是std::cout是对语言环境敏感的函数，因此它优先于PowerShell UTF-8设置。如果省略了setlocale()函数，则std::cout会遵循shell编码。

如果您具有Powershell 5.1及更高版本，则>是Out-File的别名。您可以通过$PSDefaultParameterValues设置编码：

像这样：

$PSDefaultParameterValues['Out-File:Encoding'] = 'UTF8'

然后，您将得到一个UTF-8文件（其BOM可能很烦人！），而不是默认的UTF-16LE。

编辑-根据OP的要求添加一些详细信息

PowerShell正在使用OEM代码页，因此默认情况下，您会在Windows上获得设置。我建议阅读encoding on windows上的精彩文章。关键是，如果您没有将UTF8设置为Powershell，那么您将位于代码页上。

output.exe在c ++程序中将语言环境设置为English_United States.1252，而output_original.exe未对其进行任何更改：

以下是输出没有的UTF8 PowerShell设置：

c:\t>.\output.exe
æ-¥æo¬èªz  --> nonsese within the win1252 code page
c:\t>.\output.exe | hexdump
0000000 97e6 e6a5 ac9c aae8 009e --> both hex outputs are the same!
0000009
c:\t>.\output_original.exe
µùÑµ£¼Φ¬₧  --> nonsense but different one! (depens on your locale setup - my was English)
c:\t>.\output_original.exe | hexdump
0000000 97e6 e6a5 ac9c aae8 009e  --> both hex outputs are the same!
0000009

那么这里发生了什么？您的程序将根据程序本身或Windows（在我的虚拟机上为OEM代码1252）设置的语言环境给出输出。请注意，在这两个版本中，hexdump都是相同的，但是输出（带有编码）不同。

如果使用[System.Text.Encoding]::UTF8将PowerShell设置为UTF8：

PS C:\t> [System.Console]::OutputEncoding = [System.Console]::InputEncoding = [System.Text.Encoding]::UTF8
PS C:\t> .\output.exe 
æ—¥æœ¬èªž  --> the english locales 1252 set within program notice that the output is similar to the above one (but the hexdump is different)
PS C:\t> .\output.exe | hexdump
0000000 bbef 3fbf 3f3f 0a0d  -> again hex dump is same for both so they are producing the same output!
0000008
PS C:\t> .\output_original.exe
日本語 --> correct output due to the fact you have forced the PowerShell encoding to UTF8, thus removing the output dependence on the OEM code (windows)
PS C:\t> .\output_original.exe | hexdump
0000000 bbef 3fbf 3f3f 0a0d -> again hex dump is same for both so they are producing the same output!
0000008

这里发生了什么？如果在c ++应用程序中强制使用语言环境，则std:cout将使用该语言环境进行格式化（1252），然后将那些字符转换为UTF8格式（这就是第一和第二个示例稍有不同的原因）。当您不在c ++应用程序中强制使用语言环境时，将采用PowerShell编码，该编码现在为UTF8，您将获得正确的输出。

我发现有趣的一件事是，如果将Windows系统的语言环境更改为中文兼容的语言环境（中国，澳门，台湾，香港等），则在不强制使用UTF8时，您会得到一些中文字符，但不是其他字符。这意味着这些字节仅是Unicode，因此仅在其中起作用。如果即使在中文的Windows系统语言环境中，即使在PowerShell上强制使用UTF8，它也可以正常工作。

我希望这能在更大程度上回答您的问题。

Rant： 我花了很长时间进行调查，因为VS 2019社区版已过期（WFT MS？），我无法注册它，因为注册窗口完全空白。谢谢MS，但不，谢谢。

为什么LC_ALL setlocale设置会影响Powershell中的cout输出？

1 个答案:

根据评论进行编辑

编辑-根据OP的要求添加一些详细信息