Question

我正在尝试合并两个数据框（称为d1和small）。我导出了每个数据框并使其可用here。

d1数据框用于生成small数据框。我使用了一系列for if循环来确定sps数据集中每个物种（d1）的存在/不存在（在两个小时内）以生成small数据集。

我要做的是从TRUE获取FALSE / small行并将其与d1合并以获得类似的内容（假设示例）：< / p>

         datetime     MUVI80 MUXX80 MICRO80 TAHU80 TAST80 ERDO80 LEAM80 ONZI80 MEME80 MAMO80 sps  pp      datetime        km crossingtype
1 2012-06-19 01:42:00  FALSE  FALSE    TRUE  FALSE  FALSE  FALSE  FALSE FALSE  FALSE  FALSE MICRO  0  2012-06-19 02:19    80  Exploration
2 2012-06-21 21:42:00  FALSE   TRUE   FALSE  FALSE  FALSE  FALSE  FALSE FALSE  FALSE  FALSE MUXX   1  2012-06-21 23:23    80      Unknown
3 2012-07-15 09:42:00  FALSE  FALSE   FALSE  FALSE  FALSE  FALSE  FALSE FALSE  FALSE   TRUE MAMO   0  2012-07-15 11:38    80     Complete
4 2012-07-20 21:42:00  FALSE  FALSE    TRUE  FALSE  FALSE  FALSE  FALSE FALSE  FALSE  FALSE MICRO  0  2012-07-20 22:19    80  Exploration
5 2012-07-29 21:42:00  FALSE  FALSE    TRUE  FALSE  FALSE  FALSE  FALSE FALSE  FALSE  FALSE MICRO  0  2012-07-29 23:03    80  Exploration
6 2012-08-08 23:42:00  FALSE  FALSE    TRUE  FALSE  FALSE  FALSE  FALSE FALSE  FALSE  FALSE MICRO  0  2012-08-07 02:04    80     Complete

虽然两个数据集共享一个公共字段datetime，但它们的格式不同，这会导致问题，原因有两个：

datetime字段是POSIXct中的small对象，但不在d1中。
要在datetime中创建small字段，我还制作了2小时的时间段（即我在两小时内询问了物种存在（TRUE）或缺席（ FALSE））。这意味着datetime字段在small和d1数据集之间不会完全匹配。相反，datetime中的d1字段位于datetime中small字段的2小时内的任何位置。

因此，当我尝试：

time<-dplyr::full_join(small, d1, by = "datetime")

显然不起作用。

我得到的错误如下：

Error in full_join_impl(x, y, by$x, by$y, suffix$x, suffix$y, check_na_matches(na_matches)) : cannot join a POSIXct object with an object that is not a POSIXct object

有没有人对我的建议有任何建议：

检查不同datetime字段的格式，然后将它们强制转换为相同的格式。
合并这两个数据集（尽管datetime字段中的小时不匹配）。

Answer 1

包sqldf提供了处理基于range的data.frame和表连接方案的灵活性。让我演示sqldf可用于解决OP中提到的问题的方式。

I started with reading data from files shared in OP. 

library(sqldf)

# Read the data from d1.txt. Pretty straight forward.
d1 <- read.table("d1.txt", header = TRUE, stringsAsFactors = FALSE)

# The datetime column is character. Hence change it to POSIXct
d1$datetime <- as.POSIXct(d1$datetime)

# small.txt file doesn't contain datetime together. Need to introduce 
# another column as onlytime to read time part separately. 
small <- read.table("small.txt", header = TRUE, stringsAsFactors = FALSE)

# merge onlytime part with date part in datetime column
small$datetime = paste(small$datetime, small$onlytime, sep = " ")
# drop column onlytime
small$onlytime <- NULL
# Now datetime column is character. Hence change it to POSIXct
small$datetime <- as.POSIXct(small$datetime)

# everything is ready now. Lets join two dataframes
# small$datetime is at 2 hours interval and represent data for past 2 hours
# Hence range matching records to be found within 2 hours(2*60*60) before and 
# time of current row

time = sqldf("select * from d1
                inner join small
               on d1.datetime between (small.datetime - 2*60*60) and small.datetime")


head(time, 3)
     ID       date   sps  time pp            datetime km crossingtype            datetime MUVI80 MUXX80 MICRO80 TAHU80 TAST80 ERDO80 LEAM80 ONZI80
1 15185 2012-10-22 MICRO  3:42  0 2012-10-22 03:42:00 80      Unknown 2012-10-22 03:42:00  FALSE  FALSE    TRUE  FALSE  FALSE  FALSE  FALSE  FALSE
2 15187 2012-10-23 MICRO  0:40  0 2012-10-23 00:40:00 80      Unknown 2012-10-23 01:42:00  FALSE  FALSE    TRUE  FALSE  FALSE  FALSE  FALSE  FALSE
3 17018 2012-10-29 MICRO 21:03  0 2012-10-29 21:03:00 80      Unknown 2012-10-29 21:42:00  FALSE  FALSE    TRUE  FALSE  FALSE  FALSE  FALSE  FALSE

可以更改连接类型以适合OP中的实际对象。

dplyr中带有POSIXct对象

1 个答案: