我目前正在处理来自食品卡车的数据,它由申请人,一周中的一天,开始时间和结束时间组织。
我被要求制作单独的行或列,以描述在开始时间和结束时间(打开,未打开)范围内是否发生了一小时。
有没有办法要求R返回,在一天中的每个小时,哪些小时落在开始时间和结束时间的范围内并标记它打开。然后在不在范围内的每小时询问同样的事情并将其标记为未打开。
我尝试使用for循环,但没有成功。
for(Yes in c("1","2","3","4","5","6","7","8","9","10","11","12","13","14",
"15","16","17","18","19","20","21","22","23","24"))
{
print(Yes)
if(Yes %in% (NSFS$starthour %between% NSFS$endhour))
}
DayOfWeekStr Applicant starthour endhour locationid
Friday Natan's Catering 12 13 437207
Friday Linda's Catering 10 15 760539
Wednesday Mang Hang Catering 12 13 559779
Sunday Tacos Santana 17 22 453014
Friday Breaking Bread Inc. 14 18 934995
答案 0 :(得分:0)
我假设你有一个输入表,其中start_hour和end_hour是整数,例如:
# applicant day start_hour end_hour
#1 a monday 9 10
#2 a monday 12 12
#3 a monday 14 16
#4 a monday 17 18
您可以使用seq
查找开始和结束之间的所有小时数。以下代码中的想法是生成一个data.table
,其中包含营业时间(dt_open_hours
)和一个data.table
(dt_all_hours
),包含所有可能的营业时间(使用申请人和天数)在输入数据中)。通过合并两个data.tables,结果表将包含applicant
,day
和hour
的所有可能组合,但状态(Open / Not Open)将仅来自{{1 }}。最后一步是将缺失值(dt_open_hours
)转换为“未打开”:
NA
library(data.table)
dt <- structure(list(applicant = structure(c(1L, 1L, 1L, 1L), .Label = "a", class = "factor"),
day = structure(c(1L, 1L, 1L, 1L), .Label = "monday", class = "factor"),
start_hour = c(9L, 12L, 14L, 17L), end_hour = c(10L, 12L,
16L, 18L)), .Names = c("applicant", "day", "start_hour",
"end_hour"), class = "data.frame", row.names = c(NA, -4L))
# Convert to data.table
setDT(dt)
# Assign row_id unique row_ids for seq to work on one row at a time
dt[, row_id := seq(1, nrow(dt))]
# Convert start and end hour into sequence of hours between start and end
dt_open_hours <- dt[, .(state = "Open",
hour = as.integer(seq(from = start_hour, to = end_hour, by = 1))),
by = .(row_id, applicant, day)]
# Remove row_id column
dt_open_hours[, row_id := NULL]
# Generate data.table with all combinations of applicant, day and hour
dt_all_hours <- CJ(applicant = unique(dt_open_hours[, applicant]),
day = unique(dt_open_hours[, day]), hour = seq(1, 24))
# Merge
out <- dt_open_hours[dt_all_hours, on=.(applicant, day, hour)]
out[is.na(state), state := "Not Open"]
data.table如下所示:
out
更新:使用以下更新的输入data.frame:
# applicant day state hour
# 1: a monday Not Open 1
# 2: a monday Not Open 2
# 3: a monday Not Open 3
# 4: a monday Not Open 4
# 5: a monday Not Open 5
# 6: a monday Not Open 6
# 7: a monday Not Open 7
# 8: a monday Not Open 8
# 9: a monday Open 9
#10: a monday Open 10
#11: a monday Not Open 11
#12: a monday Open 12
#13: a monday Not Open 13
#14: a monday Open 14
#15: a monday Open 15
#16: a monday Open 16
#17: a monday Open 17
#18: a monday Open 18
#19: a monday Not Open 19
#20: a monday Not Open 20
#21: a monday Not Open 21
#22: a monday Not Open 22
#23: a monday Not Open 23
#24: a monday Not Open 24
# applicant day state hour
对代码进行了一些修改(除了使用更新的输入data.frame中提供的列名),主要是为了合并 DayOfWeekStr Applicant starthour endhour locationid
1 Friday Natan's Catering 12 13 437207
2 Friday Linda's Catering 10 15 760539
3 Wednesday Mang Hang Catering 12 13 559779
4 Sunday Tacos Santana 17 22 453014
5 Friday Breaking Bread Inc. 14 18 934995
变量:locationid
包含{{1}之间的小时数{}为dt_open_hours
,starthour
,endhour
(Applicant
为DayOfWeekStr
的唯一组合} {}} {} locationid
24代表locationid
,dt_all_hours
,Applicant
的相同唯一组合。 DayOfWeekStr
和locationid
的合并在dt_open_hours
,dt_all_hours
,Applicant
和DayOfWeekStr
上完成(locationid
是新的)。
hour