根据条件和日期汇总

时间:2017-09-25 13:52:17

标签: r date dplyr

我有一个如下所示的每日数据集:

date       CMA0013 CMA0047 CMA0052 CMA0067
1975-10-01       0   0.012   0.078       0
1975-10-02       0   0.012   0.078       0
1975-10-03       0   0.012   0.078       0
1975-10-04       0   0.012   0.078       0
1975-10-05       0   0.012   0.078       0
1975-10-06       0   0.012   0.078       0
...

在R中,我想按月和年计算(汇总)每列中有多少记录满足条件< 0.001。让我们说得到类似的东西:

month   year    CMA0013   CMA0047   CMA0052   CMA0067
   10   1975          6         0         0         6
   11   1975        ...

我尝试过使用aggregateddply函数的不同选项,但是,由于我对它们的了解还不是很深,我无法得到任何令人满意的解决方案。感谢所有人提供的任何帮助

不适用于ddply

的示例
df$year <- year(df$date)
df$month <- month(df$date)

df2 <- ddply(df,~year+month,summarise,
count = length(df[,df$CMA0010 < 0.001]))

它没有正确地进行求和,并且只对一列(CMA0010)

进行

3 个答案:

答案 0 :(得分:1)

这是一种方式......

library(lubridate) #to extract the year and month
df$year <- year(df$date)
df$month <- month(df$date)
df2 <- aggregate(df[, grep("CMA", names(df))], #just summarise columns starting "CMA"
                 by = list(year=df$year, month=df$month), 
                 function(x) sum(x<0.001))

df2
  year month CMA0013 CMA0047 CMA0052 CMA0067
1 1975    10       6       0       0       6

答案 1 :(得分:0)

尝试使用带有dplyr:

的lubridate包
   sum_df <- daily %>%
      mutate(month = lubridate::month(date),
               year= lubridate::year(date)) %>%
      group_by(year, month) %>%
      summarise(CMA0013 = sum(CMA0013 < 0.001),
                #The rest of you sums...
                )

答案 2 :(得分:0)

dplyrlubridate解决方案,但会自动计算所有CMA列的总和。

library(dplyr)
library(lubridate)
library(tidyr)
d %>%
    gather(key, value, -date) %>%
    mutate(year = year(date), month = month(date)) %>%
    select(-date) %>%
    group_by(year, month, key) %>%
    summarize(N = sum(value < 0.001)) %>%
    spread(key, N)

# A tibble: 1 x 6
# Groups:   year, month [1]
   year month CMA0013 CMA0047 CMA0052 CMA0067
* <dbl> <dbl>   <int>   <int>   <int>   <int>
1  1975    10       6       0       0       6