我在R
中有以下数据框Date ID
01-01-2017 12:39:00 CDF
01-01-2017 01:39:00 WED
01-01-2017 02:39:00 QWE
01-01-2017 05:39:00 TYU
01-01-2017 17:39:00 ERT
02-01-2017 02:30:34 DEF
我想计算每小时的ID数。我想要的数据框是
Date hours Count
01-01-2017 00:00 - 01:00 1
01-01-2017 01:00 - 02:00 1
01-01-2017 02:00 - 03:00 1
01-01-2017 03:00 - 04:00 0
01-01-2017 04:00 - 05:00 0
01-01-2017 05:00 - 06:00 1
.
01-01-2017 23:00 - 00:00 0
.
02-01-2017 12:00 - 01:00 0
02-01-2017 01:00 - 02:00 0
02-01-2017 02:00 - 03:00 1
如果没有id存在,我希望每小时桶为零。每个日期都包含24小时运动。
我如何在R?
中实现这一目标答案 0 :(得分:1)
这是使用lubridate
和base
R
在您提供的数据集中,您的第一次观察是01-01-2017 12:39:00
,但在您想要的输出中,有00:00 - 01:00
的计数。在下面的代码中,
12:39:00
将被视为下午12:39,因此我会假设你的意思
00:39:00
。如果情况不是这样,请告诉我
library(lubridate)
# the data
txt <- "Date,ID
01-01-2017 00:39:00,CDF
01-01-2017 01:39:00,WED
01-01-2017 02:39:00,QWE
01-01-2017 05:39:00,TYU
01-01-2017 17:39:00,ERT
02-01-2017 02:30:34,DEF"
df <- read.table(text = txt,sep = ",", header = TRUE)
# transforming the date strings into dates
dates <- as.POSIXct(strptime(df$Date, "%d-%m-%Y %H:%M:%S"))
# creating an hourly time sequence from start to end
total_time <- seq(from = floor_date(min(dates), "hour"), to =
ceiling_date(max(dates), "hour"), by = "hour")
# in case there is more than one occurrence per interval
count <- sapply(total_time, function(x) {
sum(floor_date(dates,"hour") %in% x) })
data.frame(Date = strftime(total_time, format = "%d-%m-%Y"),
hours = paste(strftime(total_time, format = "%H:%M"),
strftime(total_time + 60*60, format="%H:%M"),
sep = " - "),
Count = count)
# Date hours Count
# 1 01-01-2017 00:00 - 01:00 1
# 2 01-01-2017 01:00 - 02:00 1
# 3 01-01-2017 02:00 - 03:00 1
# 4 01-01-2017 03:00 - 04:00 0
# 5 01-01-2017 04:00 - 05:00 0
# 6 01-01-2017 05:00 - 06:00 1
# 7 01-01-2017 06:00 - 07:00 0
答案 1 :(得分:1)
tidyverse
提供了一些有用的功能,例如count
/ tally
和complete
library(tidyverse)
library(lubridate)
dat <- read_csv('Date, ID
01-01-2017 12:39:00, CDF
01-01-2017 01:39:00, WED
01-01-2017 02:39:00, QWE
01-01-2017 05:39:00, TYU
01-01-2017 17:39:00, ERT
02-01-2017 02:30:34, DEF'
)
dat %>%
mutate(
Date = dmy_hms(Date),
day = floor_date(Date, 'day'),
hour = hour(Date)
) %>%
group_by(day, hour) %>%
tally %>%
complete(day, hour = 0:23, fill = list('n' = 0))
## A tibble: 48 x 3
## Groups: day [2]
# day hour n
# <dttm> <int> <dbl>
# 1 2017-01-01 0 0
# 2 2017-01-01 1 1
# 3 2017-01-01 2 1
# 4 2017-01-01 3 0
# 5 2017-01-01 4 0
# 6 2017-01-01 5 1
# 7 2017-01-01 6 0
# 8 2017-01-01 7 0
# 9 2017-01-01 8 0
#10 2017-01-01 9 0
## ... with 38 more rows