拥有包含4列值的.csv:
data<-read.csv("C:\\Users\\mtatange\\Desktop\\Dataset.csv")
A B C D
1 1 NA 1
2 2 4 1
3 3 6 4
4 NA 8 5
data$E<-do.call(paste,c(data[c("A","B","C","D")], sep=""))
data
A B C D E
1 1 NA 1 11NA1
2 2 4 1 2241
3 3 6 4 3364
4 NA 8 5 4NA85
summary(data)
E
Length: 4
Class: Character
Mode: Character
我需要将列“E”作为矢量,它不能保留为字符变量。我试过了:
data$E[is.na(a$E)]<-0
但是仍然将列作为字符变量。如何将列转换为矢量变量?
答案 0 :(得分:2)
摆脱NA的第一个......:
df[ is.na(df) ] <- 0
df$E <- apply(df,1,function(x) as.numeric(paste0(x , collapse="")))
A B C D E
1 1 1 0 1 1101
2 2 2 4 1 2241
3 3 3 6 4 3364
4 4 0 8 5 4085
apply(df , 2 , class )
A B C D E
"numeric" "numeric" "numeric" "numeric" "numeric"
上面的解决方案为您提供了理念。或者,(相对)更快的方法是:
df[ is.na(df) ] <- 0
df$E <- as.numeric(do.call(paste0, df))
更换NA的速度非常快。在MBP笔记本电脑上的3列表中测试300,000行...
df <- data.frame( a = sample(c(1:9,NA) , 3e5 , repl = TRUE ) , b = sample(c(1:9,NA) , 3e5 , repl = TRUE ) , c = sample(c(1:9,NA) , 3e5 , repl = TRUE ) )
sum(is.na(df))
[1] 90118
system.time( (df[is.na(df)] <- 0 ) )
user system elapsed
0.250 0.021 0.269
nrow(df)
[1] 300000