我正在尝试合并两个数据框(称为d1
和small
)。我导出了每个数据框并使其可用here。
d1
数据框用于生成small
数据框。我使用了一系列for if
循环来确定sps
数据集中每个物种(d1
)的存在/不存在(在两个小时内)以生成small
数据集。
我要做的是从TRUE
获取FALSE
/ small
行并将其与d1
合并以获得类似的内容(假设示例):< / p>
datetime MUVI80 MUXX80 MICRO80 TAHU80 TAST80 ERDO80 LEAM80 ONZI80 MEME80 MAMO80 sps pp datetime km crossingtype
1 2012-06-19 01:42:00 FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE MICRO 0 2012-06-19 02:19 80 Exploration
2 2012-06-21 21:42:00 FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE MUXX 1 2012-06-21 23:23 80 Unknown
3 2012-07-15 09:42:00 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE MAMO 0 2012-07-15 11:38 80 Complete
4 2012-07-20 21:42:00 FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE MICRO 0 2012-07-20 22:19 80 Exploration
5 2012-07-29 21:42:00 FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE MICRO 0 2012-07-29 23:03 80 Exploration
6 2012-08-08 23:42:00 FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE MICRO 0 2012-08-07 02:04 80 Complete
虽然两个数据集共享一个公共字段datetime
,但它们的格式不同,这会导致问题,原因有两个:
datetime
字段是POSIXct
中的small
对象,但不在d1
中。datetime
中创建small
字段,我还制作了2小时的时间段(即我在两小时内询问了物种存在(TRUE
)或缺席( FALSE
))。这意味着datetime
字段在small
和d1
数据集之间不会完全匹配。相反,datetime
中的d1
字段位于datetime
中small
字段的2小时内的任何位置。因此,当我尝试:
time<-dplyr::full_join(small, d1, by = "datetime")
显然不起作用。
我得到的错误如下:
Error in full_join_impl(x, y, by$x, by$y, suffix$x, suffix$y, check_na_matches(na_matches)) : cannot join a POSIXct object with an object that is not a POSIXct object
有没有人对我的建议有任何建议:
datetime
字段的格式,然后将它们强制转换为相同的格式。datetime
字段中的小时不匹配)。答案 0 :(得分:2)
包sqldf
提供了处理基于range
的data.frame和表连接方案的灵活性。让我演示sqldf
可用于解决OP中提到的问题的方式。
I started with reading data from files shared in OP.
library(sqldf)
# Read the data from d1.txt. Pretty straight forward.
d1 <- read.table("d1.txt", header = TRUE, stringsAsFactors = FALSE)
# The datetime column is character. Hence change it to POSIXct
d1$datetime <- as.POSIXct(d1$datetime)
# small.txt file doesn't contain datetime together. Need to introduce
# another column as onlytime to read time part separately.
small <- read.table("small.txt", header = TRUE, stringsAsFactors = FALSE)
# merge onlytime part with date part in datetime column
small$datetime = paste(small$datetime, small$onlytime, sep = " ")
# drop column onlytime
small$onlytime <- NULL
# Now datetime column is character. Hence change it to POSIXct
small$datetime <- as.POSIXct(small$datetime)
# everything is ready now. Lets join two dataframes
# small$datetime is at 2 hours interval and represent data for past 2 hours
# Hence range matching records to be found within 2 hours(2*60*60) before and
# time of current row
time = sqldf("select * from d1
inner join small
on d1.datetime between (small.datetime - 2*60*60) and small.datetime")
head(time, 3)
ID date sps time pp datetime km crossingtype datetime MUVI80 MUXX80 MICRO80 TAHU80 TAST80 ERDO80 LEAM80 ONZI80
1 15185 2012-10-22 MICRO 3:42 0 2012-10-22 03:42:00 80 Unknown 2012-10-22 03:42:00 FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
2 15187 2012-10-23 MICRO 0:40 0 2012-10-23 00:40:00 80 Unknown 2012-10-23 01:42:00 FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
3 17018 2012-10-29 MICRO 21:03 0 2012-10-29 21:03:00 80 Unknown 2012-10-29 21:42:00 FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
可以更改连接类型以适合OP中的实际对象。