R:更改数据框

时间:2016-03-14 12:18:33

标签: r encoding utf-8 character-encoding iconv

我正在研究字符编码如何影响排序。我的问题是:

如何将数据框的单个列更改为其他字符编码?

对于上下文,我将在底部包含几个额外的步骤。

1)创建数据框:

d.enc <- data.frame( utf8 = c(" ", "_ ", " _"), 
                     mac = c(" ", "_ ", " _"), 
                     label = c("space", "underscore space", "space underscore") )

2)转换为字符向量并尝试设置编码:

d.enc2$utf8 <- as.character(d.enc$utf8)
d.enc2$mac <- as.character(d.enc$mac)
d.enc2$label <- as.character(d.enc$label)

Encoding(d.enc2$utf8) <- "UTF-8"
Encoding(d.enc2$mac) <- "MACINTOSH"
Encoding(d.enc2$utf8)
# [1] "unknown" "unknown" "unknown"
Encoding(d.enc2$mac)
# [1] "unknown" "unknown" "unknown"

3)这不是我所希望的。我原以为:

# [1] "UTF-8" "UTF-8" "UTF-8" and
# [1] "MACINTOSH" "MACINTOSH" "MACINTOSH"

4)我支持我想要的编码吗? (在Mac上运行)

temp <- iconvlist()
temp[399]
# [1] "UTF-8"
temp[338]
# [1] "MACINTOSH"

似乎支持它们。

5)一旦我可以更改编码,我想执行以下操作以查看排序顺序如何更改:

library(dplyr)
arrange(d.enc2, desc(utf8))
arrange(d.enc2, desc(mac))

6)我希望输出看起来像这样但是顺序不同,具体取决于用于排序的列:

  utf8 mac            label
1   _   _  underscore space
2    _   _ space underscore
3                     space

感谢您的任何提示!

1 个答案:

答案 0 :(得分:0)

也许晚了,但是我在以下地方看到了这一点: R- Changing encoding of column in dataframe?

for (col in colnames(mydataframe)){
  Encoding(mydataframe[[col]]) <- "UTF-8"}