R中奇怪的字符

时间:2018-05-30 01:23:46

标签: r utf-8 character-encoding

我试图在R中加载.csv。我得到这样的东西

<f3>?<e9><U+00BC>?<e4><f3> . 

我在全局选项中将我的deafult文本编码设置为UTF-8。 R可能在导出时特别编码撇号吗?

df = read.csv("text.csv", encoding="UTF-8",header=TRUE, stringsAsFactors=FALSE)

####Original CSV (Open in Notepad++)####
I don?ó?é¼?äót want
Jes?ÇÖs in the Family
others that wasn?ó?é¼?äót resolved and told
Am really happy with the this ?ƒÿü,
new ?ó?é¼?ôunbreakable?ó?é¼?¥ 
on the freeway?Ǫ.

####Load in R####
I don?<f3>?<e9><U+00BC>?<e4><f3>t want
Jes?<c7><d6>s in the Family
others that wasn?<f3>?<e9><U+00BC>?<e4><f3>t resolved and told
Am really happy with the this ?<U+0083><ff><fc>
new ?<f3>?<e9><U+00BC>?<f4>unbreakable?<f3>?<e9><U+00BC>?<U+00A5> 
on the freeway?<U+01EA>.

####What I want####
Because I don't want
Jes's in the Family
others that wasn't resolved and told
Am really happy with the this 
new 'unbreakable'
on the freeway….

感谢。

2 个答案:

答案 0 :(得分:0)

你可以这样做:

这里x是您在一个字符串中的给定数据,如下所示:

x <- "I don?ó?é¼?äót want Jes?ÇÖs in the Family others that wasn?ó?é¼?äót resolved and told Am really happy with the this ?ƒÿü, new ?ó?é¼?ôunbreakable? ?é¼?¥ on the freeway?Ǫ."

您可以将gsubiconv结合使用,以获得几乎所需的结果。我不知道如何在你的输出中得到笑容:

 gsub("\\?+","'",iconv(x, "latin1", "ASCII", sub=""))

<强>输出:

[1] "I don't want
     Jes's in the Family
     others that wasn't resolved and told
     Am really happy with the this ',
     new 'unbreakable'on the freeway'."

答案 1 :(得分:0)

您应该尝试将utf-8转换为ascii:

dt <- iconv(dt, 'utf-8', 'ascii', sub='')

iconv在“ tm”库中