在R中将ASCII转换为UTF-8 stringi

时间:2017-06-18 16:04:27

标签: r utf-8 ascii stringi

我有以下问题:

library(stringi)
x_1<-"P N001361/01"
x_2<-"Р N001361/01"
x_1==x_2
[1] FALSE

> stri_enc_mark(x_1)
[1] "ASCII"
> stri_enc_mark(x_2)
[1] "UTF-8"

然后我尝试:

stri_encode(x_1,"ASCII","UTF-8",to_raw=FALSE)==x_2

但这仍然无效。也许有人可以建议如何使这两个字符串相同(我试图将x_1合并为x_2)。

1 个答案:

答案 0 :(得分:2)

问题不在于转换。问题是x_2的第一个字母是https://unicode-table.com/en/0420/

运行时很明显:

> stri_encode(x_2,"UTF-8", "ASCII",to_raw=FALSE)
[1] "\032 N001361/01"
Warning message:
In stri_encode(x_2, "UTF-8", "ASCII", to_raw = FALSE) :
  the Unicode codepoint \U00000420 cannot be converted to destination encoding

因此,您需要将字符显式转换为实际字母&#34; P&#34;

x_2_rep <- stri_replace_all_regex(x_2, parse(text = '\U00000420'), "P")
x_1 == x_2_rep
## TRUE