我确实有以下数据结构:
x <- read.table(header=T, text="
variable class value
a a1 1
a a2 2
a a3 3
b b1 4
b b2 5
b b3 6
c c1 7
c c2 8
c a3 9")
y <- read.table(header=T, text="
a b c
a1 b2 c2
a2 b1 c1
a3 b3 a3"
)
现在我需要向df y
- out_a, out_b, out_c
添加三个变量,我需要根据列名和类将x$value
中的值映射到df y
。输出应如下所示:
a b c a_out b_out c_out
a1 b2 c3 1 5 8
a2 b1 c1 2 4 7
a3 b3 c2 3 6 9
我可以使用sqldf
来执行此操作:
sqldf("select y.*, x1.value as a_out , x2.value as b_out, x3.value as c_out
from
y
join x as x1 on (x1.class=y.a and x1.variable='a')
join x as x2 on (x2.class=y.b and x2.variable='b')
join x as x3 on (x3.class=y.c and x3.variable='c')
")
在现实世界中,我有很多专栏(50+),因此我正在寻找更优雅的东西。
答案 0 :(得分:2)
我确信有一种更优雅的方式可以做到这一点而且我不是100%我理解你正在尝试做什么,但我认为这应该可以解决问题:
for(col in names(y)){
tmp <- x[x$variable == col,c("class","value")]
y[,paste0(col,"_out")] <- tmp$value[match(as.character(y[,col]),as.character(tmp$class))]
}
a b c a_out b_out c_out
1 a1 b2 c2 1 5 8
2 a2 b1 c1 2 4 7
3 a3 b3 a3 3 6 9
答案 1 :(得分:2)
这是另一种方法:
## Convert "y" to a long data.frame
y2 <- stack(lapply(y, as.character))
## Reorder "x" according to "y2"
x2 <- x[match(do.call(paste, x[1:2]), do.call(paste, rev(y2))), ]
## Use ave to generate an "id" variable
x2$id <- ave(x2$variable, x2$variable, FUN = seq_along)
## "x2" now looks like this
x2
# variable class value id
# 1 a a1 1 1
# 2 a a2 2 2
# 3 a a3 3 3
# 5 b b2 5 1
# 4 b b1 4 2
# 6 b b3 6 3
# 8 c c2 8 1
# 7 c c1 7 2
# 9 c a3 9 3
## Use reshape to get your data in the wide format that you are looking for
reshape(x2, direction = "wide", idvar = "id", timevar = "variable")
# id class.a value.a class.b value.b class.c value.c
# 1 1 a1 1 b2 5 c2 8
# 2 2 a2 2 b1 4 c1 7
# 3 3 a3 3 b3 6 a3 9
从那里开始,它几乎是整容工作....使用一些sub
/ gsub
重命名列,并在必要时重新排序。