我有我的data.frame示例:
Time EventName
1 2015-08-02 09:09:22 logged_in
2 2015-08-02 09:35:38 deauthorize
3 2015-08-02 09:36:06 logged_in
4 2015-08-02 09:40:42 deauthorize
5 2015-08-02 09:40:48 logged_in
6 2015-08-02 09:42:46 deauthorize
7 2015-08-02 09:43:15 deauthorize
8 2015-08-02 09:44:49 deauthorize
9 2015-08-02 09:48:06 logged_in
10 2015-08-02 09:49:43 logged_in
11 2015-08-02 10:12:07 logged_in
12 2015-08-02 11:46:15 deauthorize
我想只保留logged_in
和deauthorize
对(我需要它来记录日志logged_in
和deauthorize
之间的计算时间,但有些日志丢失了)。所以我希望我的表在排序后看起来像:
Time EventName
1 2015-08-02 09:09:22 logged_in
2 2015-08-02 09:35:38 deauthorize
3 2015-08-02 09:36:06 logged_in
4 2015-08-02 09:40:42 deauthorize
5 2015-08-02 09:40:48 logged_in
6 2015-08-02 09:42:46 deauthorize
11 2015-08-02 10:12:07 logged_in
12 2015-08-02 11:46:15 deauthorize
答案 0 :(得分:2)
end <- which(c(0,diff(as.numeric(df$EventName))) == -1)
df[sort(c(end-1,end)),]
# Time EventName
# 1 2015-08-02 09:09:22 logged_in
# 2 2015-08-02 09:35:38 deauthorize
# 3 2015-08-02 09:36:06 logged_in
# 4 2015-08-02 09:40:42 deauthorize
# 5 2015-08-02 09:40:48 logged_in
# 6 2015-08-02 09:42:46 deauthorize
# 11 2015-08-02 10:12:07 logged_in
# 12 2015-08-02 11:46:15 deauthorize
这是一个使用R&n?因子强制的base R
解决方案。我们通过使用因素来找到"deauthorize"
的实例。通常他们很痛苦,但在这种情况下,能够快速将EventName
列变成一系列的1和2,有助于加快搜索速度。查看as.numeric(df$EventName)
以获取参考。
使用此索引,我们需要找到1的后跟2的情况。一种有效的方法是找到每个元素的差异。 diff(as.numeric(df$EventName))
为我们做到了这一点。您可以想象该向量的哪个值将针对我们正在寻找的案例-1
。
数据强>
df <- structure(list(Time = c("2015-08-02 09:09:22", "2015-08-02 09:35:38",
"2015-08-02 09:36:06", "2015-08-02 09:40:42", "2015-08-02 09:40:48",
"2015-08-02 09:42:46", "2015-08-02 09:43:15", "2015-08-02 09:44:49",
"2015-08-02 09:48:06", "2015-08-02 09:49:43", "2015-08-02 10:12:07",
"2015-08-02 11:46:15"), EventName = structure(c(2L, 1L, 2L, 1L,
2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L), .Label = c("deauthorize", "logged_in"
), class = "factor")), .Names = c("Time", "EventName"), row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"), class = "data.frame")