将带时间戳的数据转换为每小时和每天的数据帧时不希望有的NA

时间:2019-03-31 13:34:12

标签: r lubridate

我正在处理80秒间隔的时间戳数据,如下所示:

Sub openLastModified()
    Dim folderPath As String, tableName As String, latestTblName As String
    Dim modifiedDate As Date

    folderPath = "C:\test\"

    tableName = Dir(folderPath & "*.cdr")

    Do While tableName <> vbNullString
        modifiedDate = FileDateTime(folderPath & tableName)
        If latestModified < modifiedDate Then
            latestModified = modifiedDate
            latestTblName = tableName
        End If
        tableName = Dir()
    Loop

    OpenDocument folderPath & latestTblName
End Sub

由于提供了一些帮助,我设法对以下脚本进行了编码,该脚本应该创建一个双向表,以提供数据集中每天存在的每天每一小时(从0到23)的平均每小时活动,例如:

> head(dataraw)
    GMT_DATE GMT_TIME ACTIVITY_Z
1: 6/19/2018 00:00:00          0
2: 6/19/2018 00:01:20          0
3: 6/19/2018 00:02:40          0
4: 6/19/2018 00:04:00          0
5: 6/19/2018 00:05:20          1
6: 6/19/2018 00:06:40          1

下面是我用于此目的的代码:

  > head(act.byHour[1:3])
  hour Activity on 6/19/2018 Activity on 6/20/2018
1    0                    88                    59
2    1                    43                    74
3    2                  4297                  4341
4    3                  3708                  3676
5    4                  1728                  2143
6    5                  2528                  3890

该代码似乎运行良好,但可悲的是,每天的最后一个小时,我收到> library(lubridate) > data.byday <- split(dataraw,dataraw$GMT_DATE) > act.byHour <- Reduce(function(...) merge(..., by = c('hour')), lapply(data.byday,function(df.day) + { + df.day$hour <- as.numeric(as.difftime(df.day$GMT_TIME,units="mins")) %/% 60 + act.p.hour <- sapply(split(df.day,df.day$hour),function(df.hour){return(sum(df.hour$ACTIVITY_Z))}) + hours <- as.integer(c(names(act.p.hour),seq(0,23)[!(0:23 %in% names(act.p.hour))])) + act.p.hour <- c(act.p.hour,rep(NA,24-length(act.p.hour))) + act.p.hour <- act.p.hour[order(hours)] + return(data.frame(hour=hours,activity=act.p.hour)) + })) There were 39 warnings (use warnings() to see them) > names(act.byHour) <- c("hour",paste("Activity on",names(data.byday))) 的{​​{1}}:

NA

我希望有人能让我知道我的代码出了什么问题。我希望这是一个简单易懂的代码,但是我正在学习R,所以我觉得它很有挑战性。您可以找到完整的23数据集here以实现可复制性。

0 个答案:

没有答案