我需要将以下SQL代码应用到R。
SELECT col1, col2, col3, col4, col5, COUNT(1) AS newcol,
SUM(othercol) AS newcol2, SUM(othercol*othercol2) AS newcol3 FROM df;
GROUP BY col1, col2, col3, col4, col5;
WHERE 'some conditions'
我理解SELECT
,GROUP BY
,COUNT(1)
,SUM()
和AS()
如何单独运作,但不像上面的代码那样整体运作,主要是如何COUNT(1)
和SUM()
正在运作。
答案 0 :(得分:3)
由于OP没有提供可重现的示例,因此以下sql
语法有效(使用sqldf
)
library(sqldf)
sqldf("select col1, col2, COUNT(1) as newcol,
sum(othercol) as newcol2
from df
where col1 = 1
group by col1, col2")
# col1 col2 newcol newcol2
#1 1 a 2 -0.009295454
#2 1 b 2 -0.164004051
以上也可以使用R
方法
library(data.table)
setDT(df)[col1==1, .(newcol=.N, newcol2 = sum(othercol)), .(col1, col2)]
# col1 col2 newcol newcol2
#1: 1 a 2 -0.009295454
#2: 1 b 2 -0.164004051
或使用dplyr
library(dplyr)
df %>%
filter(col1 == 1) %>%
group_by(col1, col2) %>%
summarise(newcol = n(), newcol2 = sum(othercol))
set.seed(24)
df <- data.frame(col1 = rep(1:4, each = 4), col2 = rep(letters[1:2],
each = 2), othercol = rnorm(16), othercol2 = runif(16))