我有两个data.frames,(df1,df2),我想将P1-P10列中的值替换为df1$V2
的值,但保留df2的前两列。
df1 = data.frame(V1=LETTERS, V2=rnorm(26))
df2 <- data.frame(Name=sample(LETTERS, 6), bd=sample(1:6), P1=sample(LETTERS,6), P2=sample(LETTERS, 6), P3=sample(LETTERS, 6), P4=sample(LETTERS, 6), P5=sample(LETTERS, 6), P6=sample(LETTERS, 6), P7=sample(LETTERS, 6), P8=sample(LETTERS, 6), P9=sample(LETTERS, 6), P10=sample(LETTERS, 6))
我的方法如下:
df3 <- matrix(setNames(df1[,2], df1[,1])[as.character(unlist(df2[,3:12]))], nrow=6, ncol=10)
df4 <- data.frame(cbind(df2[,1:2], df3))
这给了我我的愿望输出,我的真实数据有10,000列,有没有办法避免cbind
步骤或使过程快速?
> df4
Name bd X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 V 6 -1.8991102 0.40269050 -0.1517500 -2.5297829 1.5315622 1.4897071 1.364071 -1.2443708 -1.3197276 -0.4917057
2 T 1 -2.5297829 -0.44614123 -0.1894970 -0.6693774 -0.1517500 -1.0650962 -0.151750 -0.4461412 -0.6693774 -1.1351770
3 R 5 -0.6693774 0.09059365 -2.5297829 0.3233827 -0.9383348 -0.4461412 1.281797 1.5315622 1.4897071 -0.4461412
4 B 4 -0.4461412 -0.93833476 -1.2443708 -0.4461412 -0.1894970 -0.9383348 -1.135177 -1.8991102 -0.1894970 0.4026905
5 K 2 -1.0180271 -1.06509624 -0.1939600 -0.1894970 1.4897071 -0.6693774 -1.899110 -1.3197276 1.5315622 -0.1517500
6 Y 3 1.5315622 -0.19396005 -0.4917057 -0.4664239 -1.8991102 0.4026905 -1.065096 -0.9383348 -1.2443708 -0.4664239
由于
答案 0 :(得分:3)
您可match
df2[3:12]
df1[[1]]
的值df1[2]
。这些行号用于从df2[3:12] <- df1[match(as.character(unlist(df2[3:12])),
as.character(df1[[1]])), 2]
中提取值。
df2
结果( Name bd P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
1 H 5 0.1199355 0.3752010 -0.3926061 -1.1039548 -0.1107821 0.9867373 -0.3360094 -0.7488000 -0.3926061 2.0667704
2 U 4 0.1168599 0.1168599 0.9867373 1.3521418 0.9867373 -0.3360094 -0.7724007 -0.3926061 -0.3360094 -1.2543480
3 R 3 -1.2337890 -0.1107821 -0.7724007 2.0667704 0.3752010 0.4645504 0.9867373 0.1168599 -0.0981773 -0.3926061
4 G 2 -0.3926061 0.3199261 -0.0981773 -0.1107821 2.0667704 -1.1039548 -1.2337890 0.3199261 -1.2337890 -2.1534678
5 C 6 -2.1534678 -1.1039548 -1.1039548 -0.7488000 0.4645504 0.3199261 -2.1534678 -0.3360094 0.9867373 0.8771467
6 I 1 0.6171634 0.6224091 1.8011711 0.7292998 0.8771467 2.0667704 0.3752010 0.4645504 -2.1534678 -0.7724007
):
df2
如果您不想替换df4
中的值,可以使用
df4 <- "[<-"(df2, 3:12, value = df1[match(as.character(unlist(df2[3:12])),
as.character(df1[[1]])), 2])
{{1}}
答案 1 :(得分:0)
尝试一些*pply
魔术:
lookup<-tapply(df1$V2, df1$V1, unique) #Creates a lookup table
lookup.function<-function(x) as.numeric(lookup[as.character(x)]) #The function
df4<-data.frame(df2[,1:2], apply(df2[,3:12], 2,lookup.function )) #Builds the output
<强>更新强>:
*pply
系列比merge
快得多,至少一个数量级。看看这个
num<-1000
df1 = data.frame(V1=LETTERS, V2=rnorm(26))
df2<-data.frame(cbind(first=1:num,second=1:num, matrix(sample(LETTERS, num^2, replace=T), nrow=num, ncol=num)))
start<-Sys.time()
lookup<-tapply(df1$V2, df1$V1, unique)
lookup.function<-function(x) as.numeric(lookup[as.character(x)])
df4<-data.frame(cbind(df2[,1:2], data.frame(apply(df2[,3:(num+2)], 2, lookup.function ))))
(difftime(Sys.time(),start))
start<-Sys.time()
df4.merge <- "[<-"(df2, 3:num, value = df1[match(as.character(unlist(df2[3:num])), as.character(df1[[1]])), 2])
(difftime(Sys.time(),start))
sum(df4==df4.merge)==num^2
对于3000列和行,*pply
组合需要4.3s,而merge
在我的慢速英特尔上需要大约22秒。它很好地扩展。对于4000列和行,相应的时间分别为7.4秒和118秒。