我有一个类似下面的数据框:
A B C
[1,] "A1" "B3" "C1"
[2,] "A2" "B1" "C2"
[3,] "A3" "B3" "C3"
[4,] "A1" "B2" "C3"
[5,] "A3" "B3" "C2"
[6,] "A1" "B1" "C1"
我想像这样重塑它,将变量的每个唯一值扩展为单个变量,并在值字段中标记1/0。以上数据框架应改为:
A B1 B2 B3 C1 C2 C3
[1,] "A1" "0" "0" "1" "1" "0" "0"
[2,] "A2" "1" "0" "0" "0" "1" "0"
[3,] "A3" "0" "0" "1" "0" "0" "1"
[4,] "A1" "0" "1" "0" "0" "0" "1"
[5,] "A3" "0" "0" "1" "0" "1" "0"
[6,] "A1" "1" "0" "0" "1" "0" "0"
真正的数据量很大(每天大于10万,还有更多的字段和独特的价值。所以我需要一个高效的程序,而不是用于......
我相信你可以帮忙...我是初学者,只知道...... :(
答案 0 :(得分:0)
您也可以尝试使用base R
):
df <- cbind(as.character(df$A), model.matrix(~B+C+0,df,list(B=contrasts(df$B, contrasts=F),
C=contrasts(df$C, contrasts=F))))
dimnames(df) <- list(NULL, c('A', paste0('B',1:3), paste0('C',1:3)))
df
# A B1 B2 B3 C1 C2 C3
#[1,] "A1" "0" "0" "1" "1" "0" "0"
#[2,] "A2" "1" "0" "0" "0" "1" "0"
#[3,] "A3" "0" "0" "1" "0" "0" "1"
#[4,] "A1" "0" "1" "0" "0" "0" "1"
#[5,] "A3" "0" "0" "1" "0" "1" "0"
#[6,] "A1" "1" "0" "0" "1" "0" "0"
答案 1 :(得分:-1)
我们可以使用
library(qdapTools)
cbind(df1[1], mtabulate(as.data.frame(t(df1[-1]))))
# A B3 C1 B1 C2 C3 B2
#V1 A1 1 1 0 0 0 0
#V2 A2 0 0 1 1 0 0
#V3 A3 1 0 0 0 1 0
#V4 A1 0 0 0 0 1 1
#V5 A3 1 0 0 1 0 0
#V6 A1 0 1 1 0 0 0