我在数据框中有时间数据,如下所示:
date day time phone lat lon acc update
6 12/08/2014 Tue 07:25:35PM 9052780809 17.41653 78.40537 3.9 1.406988e+12
44 12/08/2014 Tue 07:26:35PM 9052780809 17.41823 78.40344 3.9 1.406988e+12
114 12/08/2014 Tue 07:28:32PM 9052780809 17.41810 78.39846 3.9 1.406988e+12
152 12/08/2014 Tue 07:29:30PM 9052780809 17.41760 78.39512 3.9 1.406988e+12
188 12/08/2014 Tue 07:30:31PM 9052780809 17.41517 78.39426 3.9 1.406988e+12
223 12/08/2014 Tue 07:31:30PM 9052780809 17.41467 78.39434 3.9 1.406988e+12
大多数时间相差1-2分钟,但有些情况之间的差异超过10分钟,如二读之后。如果两者之间的差异超过10分钟,则连续读数可能在不同的日期。我希望在读取之后插入一个间隔,它们之间的间隔超过10分钟,并将它们插入另一个数据帧以进一步处理它们。
date day time phone lat lon acc update
145315 16/08/2014 Sat 11:54:57AM 9052780809 17.41377 78.45923 3.9 1.406988e+12
145371 16/08/2014 Sat 11:55:56AM 9052780809 17.41626 78.45750 3.9 1.406988e+12
145426 16/08/2014 Sat 11:56:55AM 9052780809 17.41746 78.45547 4.0 1.406988e+12
162349 16/08/2014 Sat 05:02:51PM 9052780809 17.41562 78.44446 3.9 1.406988e+12
162404 16/08/2014 Sat 05:03:55PM 9052780809 17.41577 78.44113 3.9 1.406988e+12
162452 16/08/2014 Sat 05:04:51PM 9052780809 17.41638 78.43815 3.9 1.406988e+12
原始数据有8列,超过700000行。
答案 0 :(得分:1)
只是从评论中粘贴,以便问题得到解答。您可以使用split
(由@docendo discimus建议)和difftime
(来自@Laurik)来获取预期的数据集。
假设" time1"是"时间"数据集中的列(" dat"),转换" time1" to" POSIXlt"使用strptime
的课程,使用difftime
来获得"分钟"连续元素之间。在这里,我删除了最后一个元素和第一个元素,以便我们可以找到当前dt1[-length(dt1)]
和下一个元素dt1[-1]
之间的差异,应用条件>10
,cumsum
逻辑索引和split
基于该索引的数据集,以获取data.frames(lst
)列表。在列表中工作可能更好,而不是创建单独的data.frame对象。
dt1 <- strptime(dat$time1, format='%I:%M:%OS%p')
lst <- split(dat, cumsum(c(FALSE,difftime(dt1[-length(dt1)],
dt1[-1], unit='min')>10)))
使用新数据集dat
dt1 <- with(dat, strptime(paste(date, time),
format='%d/%m/%Y %I:%M:%OS%p'))
indx <- cumsum(c(FALSE, abs(difftime(dt1[-length(dt1)], dt1[-1],
unit='min')) >10))
split(dat, indx)
#$`0`
# date day time phone lat lon acc update
#6 12/08/2014 Tue 07:25:35PM 9052780809 17.41653 78.40537 3.9 1.406988e+12
#44 12/08/2014 Tue 07:26:35PM 9052780809 17.41823 78.40344 3.9 1.406988e+12
#114 12/08/2014 Tue 07:28:32PM 9052780809 17.41810 78.39846 3.9 1.406988e+12
#152 12/08/2014 Tue 07:29:30PM 9052780809 17.41760 78.39512 3.9 1.406988e+12
#188 12/08/2014 Tue 07:30:31PM 9052780809 17.41517 78.39426 3.9 1.406988e+12
#223 12/08/2014 Tue 07:31:30PM 9052780809 17.41467 78.39434 3.9 1.406988e+12
#$`1`
# date day time phone lat lon acc update
#145315 16/08/2014 Sat 11:54:57AM 9052780809 17.41377 78.45923 3.9 1.406988e+12
#145371 16/08/2014 Sat 11:55:56AM 9052780809 17.41626 78.45750 3.9 1.406988e+12
#145426 16/08/2014 Sat 11:56:55AM 9052780809 17.41746 78.45547 4.0 1.406988e+12
#$`2`
# date day time phone lat lon acc update
#162349 16/08/2014 Sat 05:02:51PM 9052780809 17.41562 78.44446 3.9 1.406988e+12
#162404 16/08/2014 Sat 05:03:55PM 9052780809 17.41577 78.44113 3.9 1.406988e+12
#162452 16/08/2014 Sat 05:04:51PM 9052780809 17.41638 78.43815 3.9 1.406988e+12
dat <- structure(list(date = c("12/08/2014", "12/08/2014", "12/08/2014",
"12/08/2014", "12/08/2014", "12/08/2014", "16/08/2014", "16/08/2014",
"16/08/2014", "16/08/2014", "16/08/2014", "16/08/2014"), day = c("Tue",
"Tue", "Tue", "Tue", "Tue", "Tue", "Sat", "Sat", "Sat", "Sat",
"Sat", "Sat"), time = c("07:25:35PM", "07:26:35PM", "07:28:32PM",
"07:29:30PM", "07:30:31PM", "07:31:30PM", "11:54:57AM", "11:55:56AM",
"11:56:55AM", "05:02:51PM", "05:03:55PM", "05:04:51PM"), phone = c(9052780809,
9052780809, 9052780809, 9052780809, 9052780809, 9052780809, 9052780809,
9052780809, 9052780809, 9052780809, 9052780809, 9052780809),
lat = c(17.41653, 17.41823, 17.4181, 17.4176, 17.41517, 17.41467,
17.41377, 17.41626, 17.41746, 17.41562, 17.41577, 17.41638
), lon = c(78.40537, 78.40344, 78.39846, 78.39512, 78.39426,
78.39434, 78.45923, 78.4575, 78.45547, 78.44446, 78.44113,
78.43815), acc = c(3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9,
4, 3.9, 3.9, 3.9), update = c(1.406988e+12, 1.406988e+12,
1.406988e+12, 1.406988e+12, 1.406988e+12, 1.406988e+12, 1.406988e+12,
1.406988e+12, 1.406988e+12, 1.406988e+12, 1.406988e+12, 1.406988e+12
)), .Names = c("date", "day", "time", "phone", "lat", "lon",
"acc", "update"), class = "data.frame", row.names = c("6", "44",
"114", "152", "188", "223", "145315", "145371", "145426", "162349",
"162404", "162452"))