R在多个唯一维度上对数据帧进行二分法

时间:2017-07-04 09:23:58

标签: r

我有一个这样的数据框:

originalDF <- data.frame(A1=c(1, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6), 
                         A2=c(12.2, 12.2, 15.0, 34.123, 2.0, 66.0, 7.0, 7.0, 7.0, 7.0, 7.0), 
                         A3=c('T1', 'T2', 'T1', 'T1', 'T2', 'T1', 'T1', 'T1', 'T1', 'T1', 'T1'), 
                         A4=c('1234', '1234', '1234', '1234', '4321', '4321', '4321', '4321', '4321', '4321', '4321'),
                         A5=c('0245', '0245', '0500', '0500', '0600', '0600', '0600','0800','0700','0900', '0900'))
   A1     A2 A3   A4   A5
1   1 12.200 T1 1234 0245
2   1 12.200 T2 1234 0245
3   2 15.000 T1 1234 0500
4   3 34.123 T1 1234 0500
5   4  2.000 T2 4321 0600
6   5 66.000 T1 4321 0600
7   6  7.000 T1 4321 0600
8   6  7.000 T1 4321 0800
9   6  7.000 T1 4321 0700
10  6  7.000 T1 4321 0900
11  6  7.000 T1 4321 0900

我现在想要对这个数据帧进行二分法,它最终看起来像这样:

uniqueoriginalDF <- unique(subset(originalDF, select=c(A1, A2, A3, A4)))
wantedDF <- cbind.data.frame(uniqueoriginalDF, 
                             A5_0245=c(1, 1, 0, 0, 0, 0, 0), 
                             A5_0500=c(0, 0, 1, 1, 0, 0, 0), 
                             A5_0600=c(0, 0, 0, 0, 1, 1, 1), 
                             A5_0800=c(0, 0, 0, 0, 0, 0, 1), 
                             A5_0700=c(0, 0, 0, 0, 0, 0, 1), 
                             A5_0900=c(0, 0, 0, 0, 0, 0, 1))
  A1     A2 A3   A4 A5_0245 A5_0500 A5_0600 A5_0800 A5_0700 A5_0900
1  1 12.200 T1 1234       1       0       0       0       0       0
2  1 12.200 T2 1234       1       0       0       0       0       0
3  2 15.000 T1 1234       0       1       0       0       0       0
4  3 34.123 T1 1234       0       1       0       0       0       0
5  4  2.000 T2 4321       0       0       1       0       0       0
6  5 66.000 T1 4321       0       0       1       0       0       0
7  6  7.000 T1 4321       0       0       1       1       1       1

我怎样才能做到这一点? (基本R解决方案首选!)提前感谢!

1 个答案:

答案 0 :(得分:1)

我们可以使用reshape

中的base R
d1 <- reshape(transform(originalDF, A5N = 1), idvar = 
             names(originalDF)[1:4], timevar = 'A5', direction = 'wide')
d1[is.na(d1)] <- 0

但使用dcast

会更容易
library(data.table)
dcast(setDT(originalDF), ...~ paste0("A5_", A5), function(x) as.integer(length(x) > 0)) 
#    A1     A2 A3   A4 A5_0245 A5_0500 A5_0600 A5_0700 A5_0800 A5_0900
#1:  1 12.200 T1 1234       1       0       0       0       0       0
#2:  1 12.200 T2 1234       1       0       0       0       0       0
#3:  2 15.000 T1 1234       0       1       0       0       0       0
#4:  3 34.123 T1 1234       0       1       0       0       0       0
#5:  4  2.000 T2 4321       0       0       1       0       0       0
#6:  5 66.000 T1 4321       0       0       1       0       0       0
#7:  6  7.000 T1 4321       0       0       1       1       1       1