我有一个带有时间戳,类别和数据值的数据,如下所示(但是> 2000行)。
Timestamp category data
7/16/2017 18:04 x 4.9
7/16/2017 18:18 y 4.7
7/16/2017 18:32 x 8.2
7/16/2017 18:46 x 2.2
7/16/2017 19:00 y 2.7
7/16/2017 19:14 y 3.8
7/16/2017 19:28 x 8.0
7/16/2017 19:42 x 7.3
7/16/2017 19:56 z 10.1
7/16/2017 20:10 z 5.4
7/16/2017 20:42 x 17.5
7/16/2017 20:56 x 6.3
7/16/2017 21:10 z 5.8
7/16/2017 21:24 x 0.6
7/16/2017 21:38 z 2.2
7/16/2017 21:52 z 2.9
7/16/2017 22:06 y 0.5
7/16/2017 22:20 x 5.1
7/16/2017 22:34 z 8.0
7/16/2017 22:48 z 3.6
我想通过应用2个条件来计算我的数据的平均值和sd。必须每2小时计算平均值和sd。必须单独计算x,y,z类别的平均值和sd。
最终数据应该看起来像这样
Timestamp category data_avg data_sd
7/16/2017 18:00 x
7/16/2017 20:00 x
7/16/2017 22:00 x
7/17/2017 0:00 x
Timestamp category data_avg data_sd
7/16/2017 18:00 y
7/16/2017 20:00 y
7/16/2017 22:00 y
7/17/2017 0:00 y
Timestamp category data_avg data_sd
7/16/2017 18:00 z
7/16/2017 20:00 z
7/16/2017 22:00 z
7/17/2017 0:00 z
我尝试使用以下命令进行过滤和聚合
df<- aggregate(list(avgdata = df$data),
list(hourofday = cut(df$Timestamp, "1 hour")),
mean)
但它不起作用。它缺少这么多的数据点,也没有给出相同df的均值和sd。
请帮助。
答案 0 :(得分:2)
您的Timestamp列采用的格式在R中不易使用。因此,我首先将其转换为as.POSIXlt
的Datetime变量。
df$Timestamp <- as.POSIXlt(df$Timestamp, format = "%m/%d/%Y %H:%M")
head(df)
# Timestamp category data
# 1 2017-07-16 18:04:00 x 4.9
# 2 2017-07-16 18:18:00 y 4.7
# 3 2017-07-16 18:32:00 x 8.2
# 4 2017-07-16 18:46:00 x 2.2
# 5 2017-07-16 19:00:00 y 2.7
# 6 2017-07-16 19:14:00 y 3.8
在此之后,聚合函数适用于适当的参数。我将类别添加到要分组的变量列表中,并修改了FUN
参数以计算mean
和sd
。
aggregate(list(avgdata = df$data),
list(hourofday = cut(df$Timestamp, "2 hour"),
category = df$category),
FUN = function(x) c(data_avg = mean(x), data_sd = length(x)))
# hourofday category avgdata.data_avg avgdata.data_sd
# 1 2017-07-16 18:00:00 x 6.120000 5.000000
# 2 2017-07-16 20:00:00 x 8.133333 3.000000
# 3 2017-07-16 22:00:00 x 5.100000 1.000000
# 4 2017-07-16 18:00:00 y 3.733333 3.000000
# 5 2017-07-16 22:00:00 y 0.500000 1.000000
# 6 2017-07-16 18:00:00 z 10.100000 1.000000
# 7 2017-07-16 20:00:00 z 4.075000 4.000000
# 8 2017-07-16 22:00:00 z 5.800000 2.000000
答案 1 :(得分:1)
select
s.inst_name,
sa.account_type,
count(*) as total
from staffs s
join
(
select
sid,
case when count(*) = 1 then 'SINGLE' else 'DOUBLE' end as account_type
from staffaccounts
group by sid
having count(*) <= 2
) sa on sa.sid = s.sid
group by sa.account_type, s.inst_name
order by sa.account_type, s.inst_name;