到目前为止,我正在使用Windows平台,我的代码是获取扩展的ascii字符:
extendedascii=rawToChar(as.raw(seq(128,255,by=1)),multiple=TRUE)
这给了我一个载有我需要的字符的载体。
[1] "€" "" "‚" "ƒ" "„" "…" "†" "‡" "ˆ" "‰" "Š" "‹" "Œ" "" "Ž" "" "" "‘" "’" "“" "”" "•" "–" "—" "˜" "™" "š" "›" "œ" "" "ž" "Ÿ" " " "¡" "¢" "£" "¤" "¥" "¦"
[40] "§" "¨" "©" "ª" "«" "¬" "" "®" "¯" "°" "±" "²" "³" "´" "µ" "¶" "·" "¸" "¹" "º" "»" "¼" "½" "¾" "¿" "À" "Á" "Â" "Ã" "Ä" "Å" "Æ" "Ç" "È" "É" "Ê" "Ë" "Ì" "Í"
[79] "Î" "Ï" "Ð" "Ñ" "Ò" "Ó" "Ô" "Õ" "Ö" "×" "Ø" "Ù" "Ú" "Û" "Ü" "Ý" "Þ" "ß" "à" "á" "â" "ã" "ä" "å" "æ" "ç" "è" "é" "ê" "ë" "ì" "í" "î" "ï" "ð" "ñ" "ò" "ó" "ô"
[118] "õ" "ö" "÷" "ø" "ù" "ú" "û" "ü" "ý" "þ" "ÿ"
现在,在linux上,我得到了这个:
[1] "\x80" "\x81" "\x82" "\x83" "\x84" "\x85" "\x86" "\x87" "\x88" "\x89" "\x8a" "\x8b" "\x8c"
[14] "\x8d" "\x8e" "\x8f" "\x90" "\x91" "\x92" "\x93" "\x94" "\x95" "\x96" "\x97" "\x98" "\x99"
[27] "\x9a" "\x9b" "\x9c" "\x9d" "\x9e" "\x9f" "\xa0" "\xa1" "\xa2" "\xa3" "\xa4" "\xa5" "\xa6"
[40] "\xa7" "\xa8" "\xa9" "\xaa" "\xab" "\xac" "\xad" "\xae" "\xaf" "\xb0" "\xb1" "\xb2" "\xb3"
[53] "\xb4" "\xb5" "\xb6" "\xb7" "\xb8" "\xb9" "\xba" "\xbb" "\xbc" "\xbd" "\xbe" "\xbf" "\xc0"
[66] "\xc1" "\xc2" "\xc3" "\xc4" "\xc5" "\xc6" "\xc7" "\xc8" "\xc9" "\xca" "\xcb" "\xcc" "\xcd"
[79] "\xce" "\xcf" "\xd0" "\xd1" "\xd2" "\xd3" "\xd4" "\xd5" "\xd6" "\xd7" "\xd8" "\xd9" "\xda"
[92] "\xdb" "\xdc" "\xdd" "\xde" "\xdf" "\xe0" "\xe1" "\xe2" "\xe3" "\xe4" "\xe5" "\xe6" "\xe7"
[105] "\xe8" "\xe9" "\xea" "\xeb" "\xec" "\xed" "\xee" "\xef" "\xf0" "\xf1" "\xf2" "\xf3" "\xf4"
[118] "\xf5" "\xf6" "\xf7" "\xf8" "\xf9" "\xfa" "\xfb" "\xfc" "\xfd" "\xfe" "\xff"
我尝试了Encoding(extensesascii)
并获得了"Unknown"
所有元素的元素。
我也尝试了iconv(extendedascii, from="UTF-8", to="ASCII")
并最终获得了NAs。
我认为我的基本问题是我不知道我的文本是什么编码,而且,我的机器可能不知道/识别它。有什么帮助吗?
答案 0 :(得分:3)
没有扩展ASCII这样的东西。您在Windows上的编码称为Windows-1252或CP-1252。 iconv
很清楚。
如果此编码中有多个文件,则可能需要在Linux上继续使用iconv
;否则,一劳永逸地切换到UTF-8是有意义的。