Question

这是样本数据。

created_date  start_date
2014-12-11    2014-12-10
2014-12-11    2014-12-11
2014-12-12    2014-12-13
2014-12-13    NULL       
2014-12-13    2014-12-13
2014-12-13    2014-12-13
2014-12-23    NULL
2014-12-23    NULL

根据created_date，我想计算每天检查的start_date数量。 start_date的值并不重要，只检查start_dates的'number'是有意义的。

在这种情况下，for循环的结果应该是这样的

created_date  count
2014-12-11     2 
2014-12-12     1
2014-12-13     2
2014-12-23     0

我不能简单地使用table（），因为：

table（created_date）将计算created_date，而不是start_date。

>table(created_date)

created_date  count
2014-12-11     2 
2014-12-12     1
2014-12-13     3
2014-12-23     2

table（start_date）也不起作用，因为它不计算创建的“NULL”日期，更重要的是，start_date本身的值没有意义。

>table(start_date)

created_date  count
2014-12-10     1 
2014-12-11     1
2014-12-13     3
NULL           3

我想应该使用for循环，但不知道如何编写它。提前谢谢！

Answer 1

短版本：分别对完整数据和空行使用table，从第一个减去秒。

长版：

假设您的数据位于x（并且NULL实际上是NA s，see Gist for details）：

对条目进行计数，并将它们放入data_frame以方便使用：

library(dplyr)
all_counts = as_data_frame(table(x$created_date))
na_counts = as_data_frame(table(x[is.na(x$start_date), ]$created_date))

从na_counts中减去full_counts。要做到这一点，我们首先需要加入这两个表。加入将介绍NA，我们将替换为0 s：

full_join(all_counts, na_counts, by = 'Var1') %>%
    mutate(n.y = ifelse(is.na(n.y), 0, n.y)) %>%
    mutate(count = n.x - n.y) %>% # And finally, subtract the counts.
    select(created_date = Var1, count)

结果：

| created_date   |   count |
|:---------------|--------:|
| 2014-12-11     |       2 |
| 2014-12-12     |       1 |
| 2014-12-13     |       2 |
| 2014-12-23     |       0 |

如何在r中使用for循环计数单元格？（表（）不适用）

1 个答案:

如何在r中使用for循环计数单元格？ （表（）不适用）

1 个答案:

如何在r中使用for循环计数单元格？（表（）不适用）