关于编码功能的使用

时间:2015-11-16 09:19:11

标签: r encoding

我使用以下代码导入R中的特殊字符:

Encoding(self$Data$Skills) <- "UTF-8"

但是当我用:

更改列的名称时
colnames(self$Data) <- 'skills2'

然后再跑:

Encoding(self$Data$skills2) <- "UTF-8"

我有以下错误:

Error in `Encoding<-`(`*tmp*`, value = "UTF-8") : 
a character vector argument expected

我不明白为什么会这样。任何的想法?此外,如果我想从此数据帧中采样数据,也会发生同样的情况。使用:

self$Data <- data.frame(df[sample(nrow(self$Data),dim(self$Data)[1]*samplePersentance),])

列名称更改,当我编码函数时,我得到了相同的错误。使用read.csv函数导入数据。

修改: 数据负责人

                         Skills
1                          null
2                           "'"
3                  "'Fin Gaap'"
4 "'Knæ-igennem-hinanden-tr..."
5 "'Mønt-dans-på-knoerne-tr..."
6  "'Necessary knowledge of..."

> typeof(self$Data)
[1] "list"

> class(self$Data)
[1] "data.frame"

重现错误:

try1 <- structure(list(Skills = c("null", "\"'\"", "\"'Fin Gaap'\"", 
"\"'Knæ-igennem-hinanden-tr...\"", "\"'Mønt-dans-på-knoerne-tr...\"", 
"\"'Necessary knowledge of...\"")), .Names = "Skills", row.names = c(NA, 
6L), class = "data.frame")


Encoding(try1$Skills) <- 'UTF-8'
#the function runs normally
try2 <- data.frame(try1[sample(nrow(try1),floor(dim(try1)[1]*0.5)),])
colnames(try2) <- 'skills2'
Encoding(try2$skills2) <- 'UTF-8'
#the function output an error.

> typeof(try1$skills)
'character'
> typeof(try2$skills)
'intiger'

1 个答案:

答案 0 :(得分:1)

问题是data.frame及其默认stringsAsFactors = TRUE会将列转换为一个因素:

try2 <- data.frame(try1[sample(nrow(try1),floor(dim(try1)[1]*0.5)),])
colnames(try2) <- 'skills2'
#'data.frame':  3 obs. of  1 variable:
#  $ skills2: Factor w/ 3 levels "\"'\"","\"'Fin Gaap'\"",..: 3 1 2

str(try2)
Encoding(try2$skills2) <- 'UTF-8'
#Error in `Encoding<-`(`*tmp*`, value = "UTF-8") : 
#  a character vector argument expected

try2$skills2 <-as.character(try2$skills2)
Encoding(try2$skills2) <- 'UTF-8'
#works

当然,你根本不需要data.frame ......