将字符变量列转换为矢量

时间:2013-04-23 13:49:57

标签: r vector character-encoding

拥有包含4列值的.csv:

data<-read.csv("C:\\Users\\mtatange\\Desktop\\Dataset.csv")
A         B        C        D   
1         1       NA        1   
2         2        4        1   
3         3        6        4   
4        NA        8        5

data$E<-do.call(paste,c(data[c("A","B","C","D")], sep=""))
data
A         B        C        D       E        
1         1       NA        1      11NA1 
2         2        4        1      2241
3         3        6        4      3364 
4        NA        8        5      4NA85

summary(data)
E
Length: 4
Class: Character
Mode: Character

我需要将列“E”作为矢量,它不能保留为字符变量。我试过了:

data$E[is.na(a$E)]<-0

但是仍然将列作为字符变量。如何将列转换为矢量变量?

1 个答案:

答案 0 :(得分:2)

摆脱NA的第一个......:

df[ is.na(df) ] <- 0
df$E <- apply(df,1,function(x) as.numeric(paste0(x , collapse="")))
  A B C D    E
1 1 1 0 1 1101
2 2 2 4 1 2241
3 3 3 6 4 3364
4 4 0 8 5 4085

apply(df , 2 , class )
        A         B         C         D         E 
"numeric" "numeric" "numeric" "numeric" "numeric" 

上面的解决方案为您提供了理念。或者,(相对)更快的方法是:

df[ is.na(df) ] <- 0
df$E <- as.numeric(do.call(paste0, df))

更换NA的速度非常快。在MBP笔记本电脑上的3列表中测试300,000行...

df <- data.frame( a = sample(c(1:9,NA) , 3e5 , repl = TRUE ) , b = sample(c(1:9,NA) , 3e5 , repl = TRUE ) , c = sample(c(1:9,NA) , 3e5 , repl = TRUE )  )   
sum(is.na(df))
[1] 90118

system.time( (df[is.na(df)] <- 0 ) )
  user  system elapsed 
 0.250   0.021   0.269 
nrow(df)
 [1] 300000