Question

我有一个SQL查询我试图翻译成R：

private def nextIterator(): Boolean = {
        updateBytesReadWithFileSize()
        if (files.hasNext) {
          currentFile = files.next()
          ....
        }
        else { .... }
....
}

其中R数据框看起来像这样：

SELECT t."col1", t."col2", count(DISTINCT t."date")
FROM t
GROUP BY t."col1", t."col2"

实际输出应该是这样的：

col1 col2 date
a 1 2016-01-09
a 1 2016-01-02
a 1 2016-01-02
b 1 2016-01-07
b 1 2016-01-03
b 1 2016-01-02
b 1 2016-01-07
b 2 2016-01-11

我已经查看了plyr包中的count方法......但它并没有考虑到不同日期的数量。

Runinng this

col1 col2 count
a 1 2
b 1 3
b 2 1

产生这个：

count(t, c("col1", "col2"))

如何在R？

中复制SQL查询的行为

Answer 1

假设您在名为df：

的数据框中有原子级数据

library(dplyr)
df %>% 
  group_by(col_1, col_2) %>%
  summarise(distinct_ct = n_distinct(date))

Answer 2

以下是使用data.table

的选项

library(data.table)
setDT(df)[, .(distinct_ct = uniqueN(date)), by = .(col_1, col_2)]

如何分组和计算R中特定列中不同的值的出现？

2 个答案: