dplyr中带有POSIXct对象

时间:2017-12-29 17:21:46

标签: r datetime merge dplyr time-series

我正在尝试合并两个数据框(称为d1small)。我导出了每个数据框并使其可用here

d1数据框用于生成small数据框。我使用了一系列for if循环来确定sps数据集中每个物种(d1)的存在/不存在(在两个小时内)以生成small数据集。

我要做的是从TRUE获取FALSE / small行并将其与d1合并以获得类似的内容(假设示例):< / p>

         datetime     MUVI80 MUXX80 MICRO80 TAHU80 TAST80 ERDO80 LEAM80 ONZI80 MEME80 MAMO80 sps  pp      datetime        km crossingtype
1 2012-06-19 01:42:00  FALSE  FALSE    TRUE  FALSE  FALSE  FALSE  FALSE FALSE  FALSE  FALSE MICRO  0  2012-06-19 02:19    80  Exploration
2 2012-06-21 21:42:00  FALSE   TRUE   FALSE  FALSE  FALSE  FALSE  FALSE FALSE  FALSE  FALSE MUXX   1  2012-06-21 23:23    80      Unknown
3 2012-07-15 09:42:00  FALSE  FALSE   FALSE  FALSE  FALSE  FALSE  FALSE FALSE  FALSE   TRUE MAMO   0  2012-07-15 11:38    80     Complete
4 2012-07-20 21:42:00  FALSE  FALSE    TRUE  FALSE  FALSE  FALSE  FALSE FALSE  FALSE  FALSE MICRO  0  2012-07-20 22:19    80  Exploration
5 2012-07-29 21:42:00  FALSE  FALSE    TRUE  FALSE  FALSE  FALSE  FALSE FALSE  FALSE  FALSE MICRO  0  2012-07-29 23:03    80  Exploration
6 2012-08-08 23:42:00  FALSE  FALSE    TRUE  FALSE  FALSE  FALSE  FALSE FALSE  FALSE  FALSE MICRO  0  2012-08-07 02:04    80     Complete

虽然两个数据集共享一个公共字段datetime,但它们的格式不同,这会导致问题,原因有两个:

  1. datetime字段是POSIXct中的small对象,但不在d1中。
  2. 要在datetime中创建small字段,我还制作了2小时的时间段(即我在两小时内询问了物种存在(TRUE)或缺席( FALSE))。这意味着datetime字段在smalld1数据集之间不会完全匹配。相反,datetime中的d1字段位于datetimesmall字段的2小时内的任何位置。
  3. 因此,当我尝试:

    time<-dplyr::full_join(small, d1, by = "datetime")
    

    显然不起作用。

    我得到的错误如下:

    Error in full_join_impl(x, y, by$x, by$y, suffix$x, suffix$y, check_na_matches(na_matches)) : cannot join a POSIXct object with an object that is not a POSIXct object
    

    有没有人对我的建议有任何建议:

    1. 检查不同datetime字段的格式,然后将它们强制转换为相同的格式。
    2. 合并这两个数据集(尽管datetime字段中的小时不匹配)。

1 个答案:

答案 0 :(得分:2)

sqldf提供了处理基于range的data.frame和表连接方案的灵活性。让我演示sqldf可用于解决OP中提到的问题的方式。

I started with reading data from files shared in OP. 

library(sqldf)

# Read the data from d1.txt. Pretty straight forward.
d1 <- read.table("d1.txt", header = TRUE, stringsAsFactors = FALSE)

# The datetime column is character. Hence change it to POSIXct
d1$datetime <- as.POSIXct(d1$datetime)

# small.txt file doesn't contain datetime together. Need to introduce 
# another column as onlytime to read time part separately. 
small <- read.table("small.txt", header = TRUE, stringsAsFactors = FALSE)

# merge onlytime part with date part in datetime column
small$datetime = paste(small$datetime, small$onlytime, sep = " ")
# drop column onlytime
small$onlytime <- NULL
# Now datetime column is character. Hence change it to POSIXct
small$datetime <- as.POSIXct(small$datetime)

# everything is ready now. Lets join two dataframes
# small$datetime is at 2 hours interval and represent data for past 2 hours
# Hence range matching records to be found within 2 hours(2*60*60) before and 
# time of current row

time = sqldf("select * from d1
                inner join small
               on d1.datetime between (small.datetime - 2*60*60) and small.datetime")


head(time, 3)
     ID       date   sps  time pp            datetime km crossingtype            datetime MUVI80 MUXX80 MICRO80 TAHU80 TAST80 ERDO80 LEAM80 ONZI80
1 15185 2012-10-22 MICRO  3:42  0 2012-10-22 03:42:00 80      Unknown 2012-10-22 03:42:00  FALSE  FALSE    TRUE  FALSE  FALSE  FALSE  FALSE  FALSE
2 15187 2012-10-23 MICRO  0:40  0 2012-10-23 00:40:00 80      Unknown 2012-10-23 01:42:00  FALSE  FALSE    TRUE  FALSE  FALSE  FALSE  FALSE  FALSE
3 17018 2012-10-29 MICRO 21:03  0 2012-10-29 21:03:00 80      Unknown 2012-10-29 21:42:00  FALSE  FALSE    TRUE  FALSE  FALSE  FALSE  FALSE  FALSE

可以更改连接类型以适合OP中的实际对象。