我的表格如下:
dt<-data.frame(Date=c("2011-01-16","2011-01-16","2011-07-08","2011-07-09","2011-07-09","2011-08-17","2011-09-10","2011-09-11","2011-09-11"),Number=c(7,7,NA,1,1,NA,7,5,6),Hour=c(0.25,0.25,NA,0.6,0.6,NA,2,0.25,0.25))
Date Number Hour
1 2011-01-16 7 0.25
2 2011-01-16 7 0.25
3 2011-07-08 NA NA
4 2011-07-09 1 0.60
5 2011-07-09 1 0.60
6 2011-08-17 NA NA
7 2011-09-10 7 2.00
8 2011-09-11 5 0.25
9 2011-09-11 6 0.25
我想用Hour
和Number
对Date
求和。输出看起来像这样:
Date "1" "5" "6" "7"
1 2011-01-16 NA NA NA 0.5
2 2011-07-08 NA NA NA NA
3 2011-07-09 1.2 NA NA NA
4 2011-08-17 NA NA NA NA
5 2011-09-10 NA NA NA 2.0
6 2011-09-11 NA 0.25 0.25 NA
您能建议我一个获取输出的函数吗?
答案 0 :(得分:2)
您可以为此使用聚合函数。
dt$Date <- as.character(dt$Date)
aggregate(dt$Hour, by = list(dt$Number, dt$Date), FUN = function(x) sum(x, na.rm = T))
或者,您可以使用它(这次不删除NA):
with(dt, aggregate(Hour, by = list(Number, Date), FUN = sum))
答案 1 :(得分:1)
我们可以利用fun.aggregate
中的dcast
library(data.table)
dcast(setDT(dt), Date + Hour ~ Number, sum)
如果OP打算在没有组合的情况下获得NA
,则创建一个条件,因为sum
的长度为0(sum(integer(0))
)返回0
dcast(setDT(dt), Date + Hour ~ Number, function(x)
if(length(x) == 0) NA_real_ else sum(x, na.rm = TRUE))[,
.(Date, Hour, `1`, `5`, `6`, `7`)]
#. Date Hour 1 5 6 7
#1: 2011-01-16 0.25 NA NA NA 0.5
#2: 2011-07-08 NA NA NA NA NA
#3: 2011-07-09 0.60 1.2 NA NA NA
#4: 2011-08-17 NA NA NA NA NA
#5: 2011-09-10 2.00 NA NA NA 2.0
#6: 2011-09-11 0.25 NA 0.25 0.25 NA
答案 2 :(得分:1)
我们可以为每个组group_by
Date
和Number
和sum
Hour
使用spread
将其更改为宽格式。但是,这也会提供NA
列(因为Number
具有NA
值),如果不需要,可以将其删除。
library(dplyr)
dt %>%
group_by(Date, Number) %>%
summarise(Hour = sum(Hour, na.rm = TRUE)) %>%
tidyr::spread(Number, Hour) %>%
select(-`<NA>`)
# Date `1` `5` `6` `7`
# <fct> <dbl> <dbl> <dbl> <dbl>
#1 2011-01-16 NA NA NA 0.5
#2 2011-07-08 NA NA NA NA
#3 2011-07-09 1.2 NA NA NA
#4 2011-08-17 NA NA NA NA
#5 2011-09-10 NA NA NA 2
#6 2011-09-11 NA 0.25 0.25 NA