我希望消除在COD中重复的行,但仅限于那些在时间上小于5分钟的行,例如。而且我想要满足条件的一行重新编制的COD仍然存在。我希望这仍然是最后一个。如果我有这些数据:
COD | Time | score | position |
-------|----------------------|---------|----------|
xx4 | 2016-07-19 10:15:30 |5452 | 2454 |
xf5 | 2016-07-19 09:23:30 |5321 | 342 |
xr1 | 2016-07-19 12:15:30 |5232 | 2328 |
xx4 | 2016-07-19 11:20:20 |1322 | 2432 |
xx4 | 2016-07-19 10:18:30 |2344 | 2534 |
xr1 | 2016-07-19 12:17:30 |8676 | 4566 |
xx4 | 2016-07-19 10:15:50 |9445 | 7655 |
我寻找的结果:
COD | Time | score | position |
-------|----------------------|---------|----------|
xx4 | 2016-07-19 10:15:30 |5452 | 2454 |
xf5 | 2016-07-19 09:23:30 |5321 | 342 |
xr1 | 2016-07-19 12:15:30 |5232 | 2328 |
xx4 | 2016-07-19 11:20:20 |1322 | 2432 |
时间格式为POSIXct
。我如何在R?
答案 0 :(得分:1)
您可以在dplyr
套件中执行此操作。按COD
分组,然后使用lag()
功能将一次与上一次进行比较。
new_data <- orig_data %>%
group_by(COD)%>%
arrange(Time) %>%
mutate(timediff = difftime(Time,lag(Time), units = "mins"),
too_soon = timediff<5,
too_soon = ifelse(is.na(too_soon), FALSE, too_soon)) %>%
filter(too_soon == FALSE) %>%
select(-too_soon)
(编辑:捕获每组中第一行的NA,并使用base::difftime()
)
答案 1 :(得分:1)
使用Options Indexes FollowSymLinks ExecCGI Includes
AllowOverride All
Require all granted
AddType text/html .shtml
AddOutputFilter INCLUDES .shtml
:
dplyr
数据强>
library(dplyr)
df %>% group_by(COD) %>% arrange(Time) %>%
mutate(Keep = ifelse(abs(difftime(Time, lag(Time), units = "mins")) > 5, T, F)) %>%
filter(is.na(Keep) | Keep == T) %>% select(-Keep)
Source: local data frame [4 x 4]
Groups: COD [3]
COD Time score position
<fctr> <time> <int> <int>
1 xf5 2016-07-19 09:23:30 5321 342
2 xx4 2016-07-19 10:15:30 5452 2454
3 xx4 2016-07-19 11:20:20 1322 2432
4 xr1 2016-07-19 12:15:30 5232 2328