Question

我需要将以下SQL代码应用到R。

SELECT col1, col2, col3, col4, col5, COUNT(1) AS newcol, 
SUM(othercol) AS newcol2, SUM(othercol*othercol2) AS newcol3 FROM df;
GROUP BY col1, col2, col3, col4, col5;
WHERE 'some conditions'

我理解SELECT，GROUP BY，COUNT(1)，SUM()和AS()如何单独运作，但不像上面的代码那样整体运作，主要是如何COUNT(1)和SUM()正在运作。

Answer 1

由于OP没有提供可重现的示例，因此以下sql语法有效（使用sqldf）

library(sqldf)
sqldf("select col1, col2, COUNT(1) as newcol, 
       sum(othercol) as newcol2 
       from df 
       where col1 = 1 
       group by col1, col2")
#  col1 col2 newcol      newcol2
#1    1    a      2 -0.009295454
#2    1    b      2 -0.164004051

以上也可以使用R方法

完成

library(data.table)
setDT(df)[col1==1, .(newcol=.N, newcol2 = sum(othercol)), .(col1, col2)]
#   col1 col2 newcol      newcol2
#1:    1    a      2 -0.009295454
#2:    1    b      2 -0.164004051

或使用dplyr

library(dplyr)
df %>%
    filter(col1 == 1) %>%
    group_by(col1, col2) %>%
    summarise(newcol = n(), newcol2 = sum(othercol))

数据

set.seed(24)
df <- data.frame(col1 = rep(1:4, each = 4), col2 = rep(letters[1:2], 
  each = 2), othercol = rnorm(16), othercol2 = runif(16))

将SQL语句转换为R代码

1 个答案:

数据