将韩语字符更改为＆＃34;碁石＆＃34;，它存在于MS950 / MS949字符集中：

Question

我们读取一个重要参数作为vm参数，它是文件的路径。现在，用户正在使用带有一些韩文字符的vm参数（文件夹已经用韩文字符命名），并且程序开始破坏，因为韩文字符被读作问号！以下实验显示了技术情况。

我尝试在eclipse和＆＃34; Debug Configurations＆＃34;中调试程序。在＆＃34;论证＆＃34;选项卡＆＃34; VM参数＆＃34;，我给出了这样的输入

-Dfilepath = d：\ XXXX \카운터

但是当我从这个程序中读到它时

String filepath = System.getProperty("filepath");

我得到带有问号的输出，如下所示。

d：\ XXXX \ ???

我知道eclipse调试GUI使用正确的编码（？）来显示正确的字符，但是当在程序中读取值时，它使用不能正确读取字符的不同编码。

java用来读取提供给它的vm参数的默认编码是什么？

如何更改eclipse中的编码，以便程序正确读取字符？

Answer 1

我的结论是转换依赖于默认编码（Windows设置＆＃34;非Unicode程序的语言＆＃34;）以下是测试程序：

package test;
import java.io.FileOutputStream;
public class Test {
public static void main(String[] args) throws Exception {
    StringBuilder sb = new StringBuilder();
    sb.append("[카운터] sysprop=[").append(System.getProperty("cenv"));
    if (args.length > 0) {
        sb.append("], cmd args=[").append(args[0]);
    }
    sb.append("], file.encoding=").append(System.getProperty("file.encoding"));
    FileOutputStream fout = new FileOutputStream("/testout");
    fout.write(sb.toString().getBytes("UTF-8"));
    fout.close();//write result to a file instead of System.out
    //Thread.sleep(10000);//For checking arguments using Process Explorer
}
}

Test1：＆＃34;非Unicode程序的语言＆＃34;是韩国人（韩国）

在命令提示符中显示：java -Dcenv=카운터 test.Test 카운터（当我使用Process Explorer验证参数时，韩语字符是正确的）

结果：[카운터] sysprop=[카운터], cmd args=[카운터], file.encoding=MS949

测试2：＆＃34;非Unicode程序的语言＆＃34;是中文（繁体，台湾）

在命令提示符中显示（从剪贴板粘贴）：java -Dcenv=카운터 test.Test 카운터（我在命令窗口中看不到韩语字符。但是，当我使用Process Explorer验证参数时，韩语字符是正确的）

结果：[카운터] sysprop=[???], cmd args=[???], file.encoding=MS950

测试3：＆＃34;非Unicode程序的语言＆＃34;是中文（繁体，台湾）

通过设置Program参数和VM参数从Eclipse启动（Process Explorer中的命令行为C:\pg\jdk160\bin\javaw.exe -agentlib:jdwp=transport=dt_socket,suspend=y,address=localhost:50672 -Dcenv=카운터 -Dfile.encoding=UTF-8 -classpath S:\ws\wtest\bin test.Test 카운터这与您在Eclipse Debug视图的Properties对话框中看到的相同）

结果：[카운터] sysprop=[???], cmd args=[bin], file.encoding=UTF-8

将韩语字符更改为＆＃34;碁石＆＃34;，它存在于MS950 / MS949字符集中：

测试1结果：[碁石] sysprop=[碁石], cmd args=[碁石], file.encoding=MS949
测试2结果：[碁石] sysprop=[碁石], cmd args=[碁石], file.encoding=MS950
测试3结果：[碁石] sysprop=[碁石], cmd args=[碁石], file.encoding=UTF-8

将韩语字符更改为＆＃34;＆＃34;，它存在于MS950字符集中：

测试1结果：[鈥焢] sysprop=[??], cmd args=[??], file.encoding=MS949
测试2结果：[鈥焢] sysprop=[鈥焢], cmd args=[鈥焢], file.encoding=MS950
测试3结果：[鈥焢] sysprop=[鈥焢], cmd args=[鈥焢], file.encoding=UTF-8

将韩语字符更改为＆＃34;宽广＆＃34;，它存在于GBK字符集中：

测试1结果：[宽广] sysprop=[??], cmd args=[??], file.encoding=MS949
测试2结果：[宽广] sysprop=[??], cmd args=[??], file.encoding=MS950
测试3结果：[宽广] sysprop=[??], cmd args=[??], file.encoding=UTF-8
测试4：为了验证我的假设，我改变了非Unicode程序的语言＆＃34;命令提示符中的中文（简体中文）和例外java -Dcenv=宽广 test.Test 宽广

结果：[宽广] sysprop=[宽广], cmd args=[宽广], file.encoding=GBK

在测试期间，我总是通过Process Explorer检查命令行，并确保所有字符都正确。但是，在调用main(String[] args) of Java class之前，使用默认编码转换命令参数字符。如果默认编码的字符集中不存在char之一，则程序将获得意外的参数。

我不确定问题是由java.exe / javaw.exe还是Windows引起的。但是通过命令参数传递非ASCII参数不是一个好主意。

BTW，我也尝试通过.bat文件执行命令（文件编码为UTF-8）。也许某人有兴趣，

Test5：＆＃34;非Unicode程序的语言＆＃34;是韩国人（韩国）

Process Explorer中的命令行是java -Dcenv=移댁슫?? test.Test 移댁슫??（韩语字符已折叠）

结果：[카운터] sysprop=[移댁슫??], cmd args=[移댁슫??], file.encoding=MS949

Test6：＆＃34;非Unicode程序的语言＆＃34;是韩国人（韩国）

添加其他VM参数。 Process Explorer中的命令行为java -Dfile.encoding=UTF-8 -Dcenv=移댁슫?? test.Test 移댁슫??（韩语字符已折叠）

结果：[카운터] sysprop=[移댁슫??], cmd args=[移댁슫??], file.encoding=UTF-8

Test7：＆＃34;非Unicode程序的语言＆＃34;是中文（繁体，台湾）

Process Explorer中的命令行是java -cp s:\ws\wtest\bin -Dcenv=儦渥?? test.Test 儦渥??（韩语字符已折叠）

结果：[카운터] sysprop=[儦渥??], cmd args=[儦渥??], file.encoding=MS950

eclipse vm arguement中使用的字符编码是什么？

1 个答案:

Test1：＆＃34;非Unicode程序的语言＆＃34;是韩国人（韩国）

测试2：＆＃34;非Unicode程序的语言＆＃34;是中文（繁体，台湾）

测试3：＆＃34;非Unicode程序的语言＆＃34;是中文（繁体，台湾）

将韩语字符更改为＆＃34;碁石＆＃34;，它存在于MS950 / MS949字符集中：

将韩语字符更改为＆＃34;＆＃34;，它存在于MS950字符集中：

将韩语字符更改为＆＃34;宽广＆＃34;，它存在于GBK字符集中：

Test5：＆＃34;非Unicode程序的语言＆＃34;是韩国人（韩国）

Test6：＆＃34;非Unicode程序的语言＆＃34;是韩国人（韩国）

Test7：＆＃34;非Unicode程序的语言＆＃34;是中文（繁体，台湾）