Question

我试图让Eclipse的控制台使用波兰语（或任何其他非英语）字符，无论是ANSI还是UTF-8编码。似乎在Windows上，R只能用ANSI编码，而Eclipse＆＃34; s控制台强制使用UTF-8或ISO-8859-1。

尝试使用ANSI CP-1250（Windows的默认波兰语编码），I：

将R脚本文件编码为ANSI CP-1250
设置Eclipse属性，包括＆＃34; R Script File＆＃34;内容类型（一般 - ＆gt;内容类型 - ＆gt;文本），＆＃34;文本文件编码＆＃34; （一般 - ＆gt;工作区），控制台编码（＆＃34;运行配置＆＃34; - ＆gt;＆＃34; R控制台＆＃34; - ＆gt;＆＃34;普通＆＃34;，作为cp1250
通过添加以下行来设置eclipse.ini中的JVM属性：＆＃34; -Dclient.encoding.override = cp1250＆＃34;，＆＃34; -Dfile.encoding = cp1250＆＃34;

完全没有效果。 如何强制Eclipse在R的区域设置中进行编码和显示？

当所有这些选项都设置为＆＃39; UTF-8＆＃39;而不是＆＃39; CP-1250＆＃39;时，完全相同的行为仍然存在。请注意，我无法将R的区域设置为＆＃39; UTF-8＆＃39;在Windows上。值得一提的是Rstudio，Rgui和Rterm不会对默认的CP-1250编码造成任何问题，字符串会正确显示。

已执行的脚本：

print(Sys.getlocale())
Sys.setenv(LANG = 'pl_PL.cp1250')

x <- 'ąęłóżść'
message('Printing variable'); print(x); print(charToRaw(x))

输出1：＆＃39;通过来源＆＃39; - ＆gt;运行字符串用ANSI CP1250编码，但打印为ISO-8859-1

> source("C:/mjktfw/pit/workspace/test_encoding/run3.R", echo=FALSE, encoding="cp1250")
[1] "LC_COLLATE=Polish_Poland.1250;LC_CTYPE=Polish_Poland.1250;LC_MONETARY=Polish_Poland.1250;LC_NUMERIC=C;LC_TIME=Polish_Poland.1250"
Printing variable
[1] "¹ê³ó
[1] b9 ea b3 f3 bf 9c e6

输出2：＆＃39;直接通过提交运行＆＃39; - ＆gt;用UTF-8编码的字符串，正确打印

> print(Sys.getlocale())
[1] "LC_COLLATE=Polish_Poland.1250;LC_CTYPE=Polish_Poland.1250;LC_MONETARY=Polish_Poland.1250;LC_NUMERIC=C;LC_TIME=Polish_Poland.1250"
> Sys.setenv(LANG = 'pl_PL.cp1250')
> 
> x <- 'ąęłóżść'
> message('Printing variable'); print(x); print(charToRaw(x))
Printing variable
[1] "ąęłóżść"
 [1] c4 85 c4 99 c5 82 c3 b3 c5 bc c5 9b c4 87

输出3：复制粘贴到控制台： - ＆gt;用UTF-8编码的字符串，正确打印

> print(Sys.getlocale())
[1] "LC_COLLATE=Polish_Poland.1250;LC_CTYPE=Polish_Poland.1250;LC_MONETARY=Polish_Poland.1250;LC_NUMERIC=C;LC_TIME=Polish_Poland.1250"
> Sys.setenv(LANG = 'pl_PL.cp1250')
> 
> x <- 'ąęłóżść'
> message('Printing variable'); print(x); print(charToRaw(x))
Printing variable
[1] "ąęłóżść"
 [1] c4 85 c4 99 c5 82 c3 b3 c5 bc c5 9b c4 87

输出4：＆＃39;在R＆＃39;中运行整个命令快捷方式 - ＆gt;字符串编码为ANSI CP1250，但打印为明文Unicode代码点

> print(Sys.getlocale())
[1] "LC_COLLATE=Polish_Poland.1250;LC_CTYPE=Polish_Poland.1250;LC_MONETARY=Polish_Poland.1250;LC_NUMERIC=C;LC_TIME=Polish_Poland.1250"
> Sys.setenv(LANG = 'pl_PL.cp1250')
> x <- 'ąęłóżść'
> message('Printing variable')
Printing variable
> print(x)
[1] "<U+00B9><ea><U+00B3><f3><U+00BF><U+009C><e6>"
> print(charToRaw(x))
[1] b9 ea b3 f3 bf 9c e6

修改

经过一些修补之后，上述情况会产生不同的“原始”情况。字符串的编码和Encoding()<-参数。下面的输出比较了R / Rstudio和Eclipse / StatET行为：

的Eclipse

> # UTF-8 encoded string
> char <- rawToChar(as.raw(c(0xea, 0xb3, 0x9c)))
> sprintf('String: %s', char); sprintf('Encoding: %s | Raw: %s', Encoding(char), paste(charToRaw(char), collapse = ' '))
[1] "String: 곜"
[1] "Encoding: unknown | Raw: ea b3 9c"
> 
> Encoding(char) <- 'UTF-8'
> sprintf('String: %s', char); sprintf('Encoding: %s | Raw: %s', Encoding(char), paste(charToRaw(char), collapse = ' '))
[1] "String: <U+ACDC>"
[1] "Encoding: UTF-8 | Raw: ea b3 9c"
> 
> # ANSI encoded string
> char <- rawToChar(as.raw(c(0xc4, 0x99, 0xc5, 0x82, 0xc5, 0x9b)))
> sprintf('String: %s', char); sprintf('Encoding: %s | Raw: %s', Encoding(char), paste(charToRaw(char), collapse = ' '))
[1] "String: ęłś"
[1] "Encoding: unknown | Raw: c4 99 c5 82 c5 9b"
> 
> Encoding(char) <- 'UTF-8'
> sprintf('String: %s', char); sprintf('Encoding: %s | Raw: %s', Encoding(char), paste(charToRaw(char), collapse = ' '))
[1] "String: 곜"
[1] "Encoding: UTF-8 | Raw: c4 99 c5 82 c5 9b"

Rstudio

> # UTF-8 encoded string
> char <- rawToChar(as.raw(c(0xea, 0xb3, 0x9c)))
> sprintf('String: %s', char); sprintf('Encoding: %s | Raw: %s', Encoding(char), paste(charToRaw(char), collapse = ' '))
[1] "String: ęłś"
[1] "Encoding: unknown | Raw: ea b3 9c"
> 
> Encoding(char) <- 'UTF-8'
> sprintf('String: %s', char); sprintf('Encoding: %s | Raw: %s', Encoding(char), paste(charToRaw(char), collapse = ' '))
[1] "String: 곜"
[1] "Encoding: UTF-8 | Raw: ea b3 9c"
> 
> # ANSI encoded string
> char <- rawToChar(as.raw(c(0xc4, 0x99, 0xc5, 0x82, 0xc5, 0x9b)))
> sprintf('String: %s', char); sprintf('Encoding: %s | Raw: %s', Encoding(char), paste(charToRaw(char), collapse = ' '))
[1] "String: Ä™Ĺ‚Ĺ›"
[1] "Encoding: unknown | Raw: c4 99 c5 82 c5 9b"
> 
> Encoding(char) <- 'UTF-8'
> sprintf('String: %s', char); sprintf('Encoding: %s | Raw: %s', Encoding(char), paste(charToRaw(char), collapse = ' '))
[1] "String: ęłś"
[1] "Encoding: UTF-8 | Raw: c4 99 c5 82 c5 9b"

在Eclipse + StatET中修改R控制台输入/输出编码

修改

0 个答案: