将不规则的H:M:S时间戳数据组合成R中的每小时间隔

时间:2017-07-23 18:39:28

标签: r timestamp forecasting

如果已经有类似查询的答案但我似乎无法找到它,请道歉!我是R的新手,但决定不再回到VBA ...

我的问题是准备好用ses预测的数据。我有一组带有我从Excel导入的时间戳的票证数据(~25,000个条目):

      Number             Created        Category  Priority `Incident state` `Reassignment count` Urgency  Impact
   <dbl>              <dttm>           <chr>     <chr>            <chr>                <dbl>   <chr>   <chr>
1      1 2014-07-01 19:16:00 Software/System 5 - Minor           Closed                    0 3 - Low 3 - Low
2      2 2014-07-02 15:27:00 Software/System 5 - Minor           Closed                    0 3 - Low 3 - Low
3      3 2014-07-02 15:27:00 Software/System 5 - Minor           Closed                    0 3 - Low 3 - Low
4      4 2014-07-02 15:27:00 Software/System 5 - Minor           Closed                    0 3 - Low 3 - Low
5      5 2014-07-02 15:28:00 Software/System 5 - Minor           Closed                    0 3 - Low 3 - Low
6      6 2014-07-02 15:29:00 Software/System 5 - Minor           Closed                    0 3 - Low 3 - Low

由于在工作时间之外没有票据被提出,所以数据没有定期间隔,因此我无法指定seq()。在转换为我可以预测的时间序列之前,我需要将Created列子集化为每小时块。我尝试将Created列四舍五入到几个小时:

modelling_messy$Created <- as.POSIXct(modelling_messy$Created,format="%Y/%m/%d %H:%M:%S", tz = "GMT")
modelling_messy$Created <- as.POSIXct(round(modelling_messy$Created, units = "hours"))

这使得我的数据看起来像我想要的方式,并允许我聚合()所有条目具有相同的每小时时间戳,但是当我使用ts()

时,它会变得很笨拙
# A tibble: 2 x 8
  Number             Created        Category Priority `Incident state` `Reassignment count` Urgency  Impact
   <dbl>              <dttm>           <chr>    <dbl>            <chr>                <dbl>   <chr>   <chr>
1      1 2014-07-01 19:00:00 Software/System        5           Closed                    0 3 - Low 3 - Low
2      2 2014-07-02 15:00:00 Software/System        5           Closed                    0 3 - Low 3 - Low

> myts <- ts(modelling_clean[,1:2], start = c(2014-07-01, 1), freq = 1)
> head(myts)
Time Series:
Start = 2006 
End = 2011 
Frequency = 1 
        Group.1 Number
2006 1404241200      1
2007 1404313200      5
2008 1404316800      1
2009 1404907200      8
2010 1404910800     28
2011 1404914400      1

我知道我以某种方式弄乱了ts(),但我找不到如何解决它!我希望时间数据保持为“%Y-%m-%d%H:00:00”或其他有用的日期/小时组合(我只是覆盖2014年至2017年)。

非常感谢任何和所有帮助。

Ta很多。

EDIT 感谢您的建议 - 我认为这将解决转换为时间序列的问题但我不确定如何获取df $的数据从我当前的Tibble创建(太多的数据来手动编码!)我尝试了以下但是犯了一个错误:

> df = data.frame(Created = modelling_messy$Created),stringsAsFactors = F)
Error: unexpected ',' in "df = data.frame(Created = modelling_messy$Created),"
> df$id = seq_along(nrow(df))
Error in df$id = seq_along(nrow(df)) : 

类型'closure'的对象不是子集化的

提前致谢!

1 个答案:

答案 0 :(得分:1)

您可以使用xts包创建每小时时间序列,如下所示:

library(xts)

# sample data
df = data.frame(Created = c("2014-07-01 19:16:00","2014-07-02 15:27:00","2014-07-02 15:27:00","2014-07-02 15:27:00",
                "2014-07-02 15:28:00","2014-07-02 15:29:00"),stringsAsFactors = F)
df$id = seq_along(nrow(df))

# Round dates to hours
df$Created <- as.POSIXct(df$Created,format="%Y-%m-%d %H", tz = "GMT")


# Let's aggregate and create hourly data
df = aggregate(id ~ Created, df,length)
time_series = data.frame(Created= seq( min(df$Created), max(df$Created),by='1 hour'))
time_series = merge(time_series,df,by="Created",all.x=TRUE)
time_series$id[is.na(time_series$id)]=0

# create timeseries object
library(xts)
myxts = xts(time_series$id, order.by = time_series$Created)

输出:

                    [,1]
2014-07-01 19:00:00    1
2014-07-01 20:00:00    0
2014-07-01 21:00:00    0
2014-07-01 22:00:00    0
2014-07-01 23:00:00    0
2014-07-02 00:00:00    0
2014-07-02 01:00:00    0
2014-07-02 02:00:00    0
2014-07-02 03:00:00    0
2014-07-02 04:00:00    0
2014-07-02 05:00:00    0
2014-07-02 06:00:00    0
2014-07-02 07:00:00    0
2014-07-02 08:00:00    0
2014-07-02 09:00:00    0
2014-07-02 10:00:00    0
2014-07-02 11:00:00    0
2014-07-02 12:00:00    0
2014-07-02 13:00:00    0
2014-07-02 14:00:00    0
2014-07-02 15:00:00    5

它在工作!

enter image description here

免责声明:这是我第一次玩R中的时间序列,因此可能有其他(即更好的)方法来实现这一目标。