查找同一天推出三部电影并将其存储在变量date_three中的日期
releasedate<-count(bollywood$Rdate)
> releasedate
x freq
1 01-05-2015 1
2 02-10-2015 2
3 03-07-2015 1
4 04-09-2015 1
5 04-12-2015 1
6 05-06-2015 1
7 06-02-2015 1
8 06-03-2015 1
9 07-08-2015 1
10 08-05-2015 2
11 09-01-2015 1
12 09-10-2015 1
13 10-04-2015 1
14 11-09-2015 1
15 12-06-2015 1
16 12-11-2015 1
17 13-02-2015 1
18 13-03-2015 1
19 14-08-2015 1
20 15-05-2015 1
21 16-01-2015 1
22 16-10-2015 1
23 17-04-2015 1
24 17-07-2015 1
25 18-09-2015 1
26 18-12-2015 2
27 19-06-2015 1
28 20-02-2015 1
29 20-03-2015 1
30 21-08-2015 2
31 22-05-2015 1
32 22-10-2015 1
33 23-01-2015 2
34 25-09-2015 2
35 26-06-2015 1
36 27-02-2015 2
37 27-11-2015 1
38 28-05-2015 1
39 28-08-2015 1
40 30-01-2015 2
41 30-10-2015 3
42 31-07-2015 1
>subset(releasedate$x,releasedate$freq==3)
>[1] 30-10-2015
42 Levels: 01-05-2015 02-10-2015 03-07-2015 04-09-2015 04-12-2015 ... 31-07-2015
有没有其他方法可以通过它们的出现来搜索向量中的元素?
答案 0 :(得分:1)
使用dplyr
:
library(dplyr)
date_three = bollywood %>% count(Rdate) %>% filter(n >= 3)
使用data.table
:
library(data.table)
date_three = setDT(bollywood)[ , list(freq=.N), by = Rdate ][freq >= 3]
或稍微更直接
date_three = setDT(bollywood)[, if (.N >= 3L) .(freq = .N), by = Rdate]
FWIW,这里有一些时间:
# Fake data
set.seed(2488)
bollywood=data.frame(Rdate=sample(seq(as.Date("2015-01-01"), as.Date("2015-12-31"), "1 day"),
1e6, replace=TRUE))
microbenchmark::microbenchmark(
eipiDplyr = bollywood %>% count(Rdate) %>% filter(n >= 3),
eipiDT = setDT(bollywood)[ , list(freq=.N), by = Rdate ][freq >= 3],
ArunDT = setDT(bollywood)[, if (.N >= 3L) .(freq = .N), by = Rdate],
times=20)
Unit: milliseconds expr min lq mean median uq max neval cld eipiDplyr 47.76676 51.21090 56.37334 53.48006 62.16901 71.94527 20 b eipiDT 43.41946 45.22264 47.57584 46.37179 47.97606 58.91733 20 a ArunDT 42.97207 44.62598 47.76645 46.40803 51.46064 56.89516 20 a