R - 使用时间条件计算平均值以及不同列上的其他条件

时间:2017-12-12 11:17:24

标签: r time aggregate mean

我有一个带有时间戳,类别和数据值的数据,如下所示(但是> 2000行)。

Timestamp   category    data  
7/16/2017 18:04 x   4.9  
7/16/2017 18:18 y   4.7  
7/16/2017 18:32 x   8.2  
7/16/2017 18:46 x   2.2  
7/16/2017 19:00 y   2.7  
7/16/2017 19:14 y   3.8  
7/16/2017 19:28 x   8.0  
7/16/2017 19:42 x   7.3  
7/16/2017 19:56 z   10.1  
7/16/2017 20:10 z   5.4  
7/16/2017 20:42 x   17.5  
7/16/2017 20:56 x   6.3  
7/16/2017 21:10 z   5.8  
7/16/2017 21:24 x   0.6  
7/16/2017 21:38 z   2.2  
7/16/2017 21:52 z   2.9  
7/16/2017 22:06 y   0.5  
7/16/2017 22:20 x   5.1  
7/16/2017 22:34 z   8.0  
7/16/2017 22:48 z   3.6  

我想通过应用2个条件来计算我的数据的平均值和sd。必须每2小时计算平均值和sd。必须单独计算x,y,z类别的平均值和sd。

最终数据应该看起来像这样

Timestamp   category    data_avg    data_sd  
7/16/2017 18:00 x         
7/16/2017 20:00 x         
7/16/2017 22:00 x         
7/17/2017 0:00  x 

Timestamp   category    data_avg    data_sd  
7/16/2017 18:00 y       
7/16/2017 20:00 y       
7/16/2017 22:00 y         
7/17/2017 0:00  y     

Timestamp   category    data_avg    data_sd  
7/16/2017 18:00 z         
7/16/2017 20:00 z         
7/16/2017 22:00 z         
7/17/2017 0:00  z       

我尝试使用以下命令进行过滤和聚合

df<- aggregate(list(avgdata = df$data), 
                   list(hourofday = cut(df$Timestamp, "1 hour")), 
                   mean)  

但它不起作用。它缺少这么多的数据点,也没有给出相同df的均值和sd。

请帮助。

2 个答案:

答案 0 :(得分:2)

您的Timestamp列采用的格式在R中不易使用。因此,我首先将其转换为as.POSIXlt的Datetime变量。

df$Timestamp <- as.POSIXlt(df$Timestamp, format = "%m/%d/%Y %H:%M")

head(df)
#             Timestamp category data
# 1 2017-07-16 18:04:00        x  4.9
# 2 2017-07-16 18:18:00        y  4.7
# 3 2017-07-16 18:32:00        x  8.2
# 4 2017-07-16 18:46:00        x  2.2
# 5 2017-07-16 19:00:00        y  2.7
# 6 2017-07-16 19:14:00        y  3.8

在此之后,聚合函数适用于适当的参数。我将类别添加到要分组的变量列表中,并修改了FUN参数以计算meansd

aggregate(list(avgdata = df$data), 
          list(hourofday = cut(df$Timestamp, "2 hour"), 
               category = df$category), 
          FUN = function(x) c(data_avg = mean(x), data_sd = length(x)))

#             hourofday category avgdata.data_avg avgdata.data_sd
# 1 2017-07-16 18:00:00        x         6.120000        5.000000
# 2 2017-07-16 20:00:00        x         8.133333        3.000000
# 3 2017-07-16 22:00:00        x         5.100000        1.000000
# 4 2017-07-16 18:00:00        y         3.733333        3.000000
# 5 2017-07-16 22:00:00        y         0.500000        1.000000
# 6 2017-07-16 18:00:00        z        10.100000        1.000000
# 7 2017-07-16 20:00:00        z         4.075000        4.000000
# 8 2017-07-16 22:00:00        z         5.800000        2.000000

答案 1 :(得分:1)

select
  s.inst_name,
  sa.account_type,
  count(*) as total
from staffs s
join
(
  select 
    sid, 
    case when count(*) = 1 then 'SINGLE' else 'DOUBLE' end as account_type
  from staffaccounts
  group by sid
  having count(*) <= 2
) sa on sa.sid = s.sid
group by sa.account_type, s.inst_name
order by sa.account_type, s.inst_name;