我试图找出如何根据条件在R中将一列值从一列加到/附加到另一列。我确信这是显而易见的,但我在努力教自己如何做到这一点时遇到了麻烦。
如果有人可以告诉我如何使用sqldf以及普通R代码中的正常' /最简单/最好的方式来做这件事!!!!
library(sqldf)
library(plyr)
x=mtcars
groups=ddply(x, .(gear, cyl), summarise, avgMPG=mean(mpg),avgHP=mean(mpg), .drop=FALSE)
#help below --- I get an error
mtcars$avg_hp=sqldf("select avgHP from groups where mtcars$gear=groups$gear and mtcars$cyl=groups$cyl")
mtcars$avg_mpg=sqldf("select avgMPG from groups where mtcars$gear=groups$gear and mtcars$cyl=groups$cyl")
答案 0 :(得分:1)
SQL连接的正确语法是:
sqldf("SELECT * FROM groups
INNER JOIN mtcars
ON mtcars.gear = groups.gear AND mtcars.cyl=groups.cyl")
导致:
gear cyl avgMPG avgHP mpg cyl disp hp drat wt qsec vs am gear carb
1 3 4 21.500 21.500 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
2 3 6 19.750 19.750 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
3 3 6 19.750 19.750 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
4 3 8 15.050 15.050 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
....
R中的等价物是
merge(mtcars, groups, all.x=TRUE)
导致
cyl gear mpg disp hp drat wt qsec vs am carb avgMPG avgHP
1 4 3 21.5 120.1 97 3.70 2.465 20.01 1 0 1 21.500 21.500
2 4 4 24.4 146.7 62 3.69 3.190 20.00 1 0 2 26.925 26.925
3 4 4 22.8 140.8 95 3.92 3.150 22.90 1 0 2 26.925 26.925
4 4 4 22.8 108.0 93 3.85 2.320 18.61 1 1 1 26.925 26.925
5 4 4 33.9 71.1 65 4.22 1.835 19.90 1 1 1 26.925 26.925
....
您还可以使用函数ave()
在一个步骤中进行分组计算:
within(mtcars, {
avgMPG <- ave(mpg, gear, cyl, FUN = mean)
avgHP <- ave(mpg, gear, cyl, FUN = mean)
})
mpg cyl disp hp drat wt qsec vs am gear carb avgHP avgMPG
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 19.750 19.750
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 19.750 19.750
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 26.925 26.925
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 19.750 19.750
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 15.050 15.050
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 19.750 19.750
.....