我正在使用一个包含跨时间案例的数据框,比如说10/01 / 18-12 / 31/18。当前,我已经编写了一个脚本,该脚本允许我按日期对数据进行子集设置,并且需要手动输入特定日期。这是带有虚拟数据集的脚本:
> mydata
date dummy
1 2018-10-01 21:41:00 A
2 2018-10-03 21:41:00 B
3 2018-10-12 21:41:00 C
4 2018-11-01 21:41:00 D
5 2018-11-02 21:41:00 E
6 2018-11-12 21:41:00 F
7 2018-11-15 21:41:00 G
8 2018-12-02 21:41:00 H
9 2018-12-07 21:41:00 I
10 2018-12-12 21:41:00 J
#put date into readable format
mydata$date <- as.POSIXct(mydata$date, format="%m/%d/%y %H:%M")
# TOCHANGE: Adjust time points accordingly.
t1 = mydata[mydata$date >= "2018-10-01" & mydata$date <= "2018-10-31",]
t2 = mydata[mydata$date >= "2018-11-01" & mydata$date <= "2018-11-30",]
t3 = mydata[mydata$date >= "2018-12-01" & mydata$date <= "2018-12-30",]
我觉得可以通过一个函数来更有效地完成此操作,尤其是因为我想使子集在子集中具有不同的时间迭代量(例如,每周,隔周,每月)。我在考虑一个函数,该函数需要输入以天为单位的时间量来保留每个子集,然后根据整个数据帧的时间量循环生成子集?还是不能将日期作为输入,而将子集数作为输入会更有意义?
您如何编写一个可以做到这一点的函数?预先感谢您的帮助!
答案 0 :(得分:0)
考虑分配月份变量,然后使用split
构建一个数据帧列表,该列表提供比单独的类似月份数据帧更好的管理。
mydata$date <- as.POSIXct(mydata$date, format="%m/%d/%y %H:%M")
mydata$month <- format(mydata$date,"%m")
month_df_list <- split(mydata, mydata$month)
# OCTOBER DATA FRAME
month_df_list$`10`
# NOVEMBER DATA FRAME
month_df_list$`11`
# DECEMBER DATA FRAME
month_df_list$`12`
请注意,如果存储在列表中,则不会丢失数据框的功能。并重命名:
month_df_list <- setNames(mydata, paste0("t", seq_along(month_df_list)))
# OCTOBER DATA FRAME
month_df_list$t1
# NOVEMBER DATA FRAME
month_df_list$t2
# DECEMBER DATA FRAME
month_df_list$t3
答案 1 :(得分:0)
data.table方法
library( data.table )
样本数据
dt <- fread("id date dummy
1 2018-10-01T21:41:00 A
2 2018-10-03T21:41:00 B
3 2018-10-12T21:41:00 C
4 2018-11-01T21:41:00 D
5 2018-11-02T21:41:00 E
6 2018-11-12T21:41:00 F
7 2018-11-15T21:41:00 G
8 2018-12-02T21:41:00 H
9 2018-12-07T21:41:00 I
10 2018-12-12T21:41:00 J", header = TRUE)
#set dates as Date
dt[, date := as.Date( date, format = "%Y-%m-%dT%H:%M:%S", tz = "Europe/Amsterdam" )]
子设置
#subset by month == 10
dt[ month(date) == 10,]
# id date dummy
# 1: 1 2018-10-01 A
# 2: 2 2018-10-03 B
# 3: 3 2018-10-12 C
#list with subset for each month
lapply( unique(month(dt$date)), function(x) dt[ month(date) == x, ])
# [[1]]
# id date dummy
# 1: 1 2018-10-01 A
# 2: 2 2018-10-03 B
# 3: 3 2018-10-12 C
#
# [[2]]
# id date dummy
# 1: 4 2018-11-01 D
# 2: 5 2018-11-02 E
# 3: 6 2018-11-12 F
# 4: 7 2018-11-15 G
#
# [[3]]
# id date dummy
# 1: 8 2018-12-02 H
# 2: 9 2018-12-07 I
# 3: 10 2018-12-12 J