排序data.frame算法R.

时间:2015-09-07 19:48:28

标签: r

我有我的data.frame示例:

Time EventName
1 2015-08-02 09:09:22 logged_in
2 2015-08-02 09:35:38 deauthorize
3 2015-08-02 09:36:06 logged_in
4 2015-08-02 09:40:42 deauthorize
5 2015-08-02 09:40:48 logged_in
6 2015-08-02 09:42:46 deauthorize
7 2015-08-02 09:43:15 deauthorize
8 2015-08-02 09:44:49 deauthorize
9 2015-08-02 09:48:06 logged_in
10 2015-08-02 09:49:43 logged_in
11 2015-08-02 10:12:07 logged_in
12 2015-08-02 11:46:15 deauthorize

我想只保留logged_indeauthorize对(我需要它来记录日志logged_indeauthorize之间的计算时间,但有些日志丢失了)。所以我希望我的表在排序后看起来像:

Time      EventName
1 2015-08-02 09:09:22 logged_in
2 2015-08-02 09:35:38 deauthorize
3 2015-08-02 09:36:06 logged_in
4 2015-08-02 09:40:42 deauthorize
5 2015-08-02 09:40:48 logged_in
6 2015-08-02 09:42:46 deauthorize
11 2015-08-02 10:12:07 logged_in
12 2015-08-02 11:46:15 deauthorize

1 个答案:

答案 0 :(得分:2)

end <- which(c(0,diff(as.numeric(df$EventName))) == -1)
df[sort(c(end-1,end)),]
#                   Time   EventName
# 1  2015-08-02 09:09:22   logged_in
# 2  2015-08-02 09:35:38 deauthorize
# 3  2015-08-02 09:36:06   logged_in
# 4  2015-08-02 09:40:42 deauthorize
# 5  2015-08-02 09:40:48   logged_in
# 6  2015-08-02 09:42:46 deauthorize
# 11 2015-08-02 10:12:07   logged_in
# 12 2015-08-02 11:46:15 deauthorize

这是一个使用R&n?因子强制的base R解决方案。我们通过使用因素来找到"deauthorize"的实例。通常他们很痛苦,但在这种情况下,能够快速将EventName列变成一系列的1和2,有助于加快搜索速度。查看as.numeric(df$EventName)以获取参考。

使用此索引,我们需要找到1的后跟2的情况。一种有效的方法是找到每个元素的差异。 diff(as.numeric(df$EventName))为我们做到了这一点。您可以想象该向量的哪个值将针对我们正在寻找的案例-1

数据

df  <- structure(list(Time = c("2015-08-02 09:09:22", "2015-08-02 09:35:38", 
"2015-08-02 09:36:06", "2015-08-02 09:40:42", "2015-08-02 09:40:48", 
"2015-08-02 09:42:46", "2015-08-02 09:43:15", "2015-08-02 09:44:49", 
"2015-08-02 09:48:06", "2015-08-02 09:49:43", "2015-08-02 10:12:07", 
"2015-08-02 11:46:15"), EventName = structure(c(2L, 1L, 2L, 1L, 
2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L), .Label = c("deauthorize", "logged_in"
), class = "factor")), .Names = c("Time", "EventName"), row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"), class = "data.frame")