我有一个这样的数据框:
originalDF <- data.frame(A1=c(1, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6),
A2=c(12.2, 12.2, 15.0, 34.123, 2.0, 66.0, 7.0, 7.0, 7.0, 7.0, 7.0),
A3=c('T1', 'T2', 'T1', 'T1', 'T2', 'T1', 'T1', 'T1', 'T1', 'T1', 'T1'),
A4=c('1234', '1234', '1234', '1234', '4321', '4321', '4321', '4321', '4321', '4321', '4321'),
A5=c('0245', '0245', '0500', '0500', '0600', '0600', '0600','0800','0700','0900', '0900'))
A1 A2 A3 A4 A5
1 1 12.200 T1 1234 0245
2 1 12.200 T2 1234 0245
3 2 15.000 T1 1234 0500
4 3 34.123 T1 1234 0500
5 4 2.000 T2 4321 0600
6 5 66.000 T1 4321 0600
7 6 7.000 T1 4321 0600
8 6 7.000 T1 4321 0800
9 6 7.000 T1 4321 0700
10 6 7.000 T1 4321 0900
11 6 7.000 T1 4321 0900
我现在想要对这个数据帧进行二分法,它最终看起来像这样:
uniqueoriginalDF <- unique(subset(originalDF, select=c(A1, A2, A3, A4)))
wantedDF <- cbind.data.frame(uniqueoriginalDF,
A5_0245=c(1, 1, 0, 0, 0, 0, 0),
A5_0500=c(0, 0, 1, 1, 0, 0, 0),
A5_0600=c(0, 0, 0, 0, 1, 1, 1),
A5_0800=c(0, 0, 0, 0, 0, 0, 1),
A5_0700=c(0, 0, 0, 0, 0, 0, 1),
A5_0900=c(0, 0, 0, 0, 0, 0, 1))
A1 A2 A3 A4 A5_0245 A5_0500 A5_0600 A5_0800 A5_0700 A5_0900
1 1 12.200 T1 1234 1 0 0 0 0 0
2 1 12.200 T2 1234 1 0 0 0 0 0
3 2 15.000 T1 1234 0 1 0 0 0 0
4 3 34.123 T1 1234 0 1 0 0 0 0
5 4 2.000 T2 4321 0 0 1 0 0 0
6 5 66.000 T1 4321 0 0 1 0 0 0
7 6 7.000 T1 4321 0 0 1 1 1 1
我怎样才能做到这一点? (基本R解决方案首选!)提前感谢!
答案 0 :(得分:1)
我们可以使用reshape
base R
d1 <- reshape(transform(originalDF, A5N = 1), idvar =
names(originalDF)[1:4], timevar = 'A5', direction = 'wide')
d1[is.na(d1)] <- 0
但使用dcast
library(data.table)
dcast(setDT(originalDF), ...~ paste0("A5_", A5), function(x) as.integer(length(x) > 0))
# A1 A2 A3 A4 A5_0245 A5_0500 A5_0600 A5_0700 A5_0800 A5_0900
#1: 1 12.200 T1 1234 1 0 0 0 0 0
#2: 1 12.200 T2 1234 1 0 0 0 0 0
#3: 2 15.000 T1 1234 0 1 0 0 0 0
#4: 3 34.123 T1 1234 0 1 0 0 0 0
#5: 4 2.000 T2 4321 0 0 1 0 0 0
#6: 5 66.000 T1 4321 0 0 1 0 0 0
#7: 6 7.000 T1 4321 0 0 1 1 1 1