按时间在R中将函数写入迭代子集数据帧

时间:2018-12-28 23:16:04

标签: r function dataframe subset

我正在使用一个包含跨时间案例的数据框,比如说10/01 / 18-12 / 31/18。当前,我已经编写了一个脚本,该脚本允许我按日期对数据进行子集设置,并且需要手动输入特定日期。这是带有虚拟数据集的脚本:

> mydata
                  date dummy
1  2018-10-01 21:41:00     A
2  2018-10-03 21:41:00     B
3  2018-10-12 21:41:00     C
4  2018-11-01 21:41:00     D
5  2018-11-02 21:41:00     E
6  2018-11-12 21:41:00     F
7  2018-11-15 21:41:00     G
8  2018-12-02 21:41:00     H
9  2018-12-07 21:41:00     I
10 2018-12-12 21:41:00     J

#put date into readable format
mydata$date <- as.POSIXct(mydata$date, format="%m/%d/%y %H:%M") 

# TOCHANGE: Adjust time points accordingly.
t1 = mydata[mydata$date >= "2018-10-01" & mydata$date <= "2018-10-31",]  
t2 = mydata[mydata$date >= "2018-11-01" & mydata$date <= "2018-11-30",]  
t3 = mydata[mydata$date >= "2018-12-01" & mydata$date <= "2018-12-30",]  

我觉得可以通过一个函数来更有效地完成此操作,尤其是因为我想使子集在子集中具有不同的时间迭代量(例如,每周,隔周,每月)。我在考虑一个函数,该函数需要输入以天为单位的时间量来保留每个子集,然后根据整个数据帧的时间量循环生成子集?还是不能将日期作为输入,而将子集数作为输入会更有意义?

您如何编写一个可以做到这一点的函数?预先感谢您的帮助!

2 个答案:

答案 0 :(得分:0)

考虑分配月份变量,然后使用split构建一个数据帧列表,该列表提供比单独的类似月份数据帧更好的管理。

mydata$date <- as.POSIXct(mydata$date, format="%m/%d/%y %H:%M") 
mydata$month <- format(mydata$date,"%m")

month_df_list <- split(mydata, mydata$month)

# OCTOBER DATA FRAME
month_df_list$`10`

# NOVEMBER DATA FRAME
month_df_list$`11`

# DECEMBER DATA FRAME
month_df_list$`12`

请注意,如果存储在列表中,则不会丢失数据框的功能。并重命名:

month_df_list <- setNames(mydata, paste0("t", seq_along(month_df_list)))

# OCTOBER DATA FRAME
month_df_list$t1

# NOVEMBER DATA FRAME
month_df_list$t2

# DECEMBER DATA FRAME
month_df_list$t3

答案 1 :(得分:0)

data.table方法

library( data.table )

样本数据

dt <- fread("id date dummy
1  2018-10-01T21:41:00     A
2  2018-10-03T21:41:00     B
3  2018-10-12T21:41:00     C
4  2018-11-01T21:41:00     D
5  2018-11-02T21:41:00     E
6  2018-11-12T21:41:00     F
7  2018-11-15T21:41:00     G
8  2018-12-02T21:41:00     H
9  2018-12-07T21:41:00     I
10 2018-12-12T21:41:00     J", header = TRUE)

#set dates as Date
dt[, date := as.Date( date, format = "%Y-%m-%dT%H:%M:%S", tz = "Europe/Amsterdam" )]

子设置

#subset by month == 10
dt[ month(date) == 10,]

#    id       date dummy
# 1:  1 2018-10-01     A
# 2:  2 2018-10-03     B
# 3:  3 2018-10-12     C

#list with subset for each month
lapply( unique(month(dt$date)), function(x) dt[ month(date) == x, ])

# [[1]]
#    id       date dummy
# 1:  1 2018-10-01     A
# 2:  2 2018-10-03     B
# 3:  3 2018-10-12     C
# 
# [[2]]
#    id       date dummy
# 1:  4 2018-11-01     D
# 2:  5 2018-11-02     E
# 3:  6 2018-11-12     F
# 4:  7 2018-11-15     G
# 
# [[3]]
#    id       date dummy
# 1:  8 2018-12-02     H
# 2:  9 2018-12-07     I
# 3: 10 2018-12-12     J