假设我有以下数据框,表示用户在不同公司注册应用程序的日期:
df <- data.frame(user = c("Tia", "Sam", "Matt", "Brandy", "Joe", "Nariko"),
company = c("Intel", "Intel", "Nvidia", "Nvidia", "Nvidia", "Google"),
registrationDate = as.Date(c("2015-01-04", "2015-01-04", "2015-01-19",
"2015-01-20", "2015-01-20", "2015-01-25")),
stringsAsFactors = FALSE)
如何创建一个向量,让我给出每个公司用户之间的平均时差来注册应用程序?
我在通过日期变量获取公司的简单摘要统计数据时遇到了一些麻烦。例如,当我尝试使用dplyr找到每个公司的最长注册日期时:
library(dplyr)
df %>%
group_by(company) %>%
mutate(maxDate = max(registrationDate))
我获得了为数据框中每一行复制的整个registrationDate向量的最大日期。好像max()函数忽略了dplyr的管道。
答案 0 :(得分:1)
另一个,使用summarize
代替mutate
:
df2 = df %>%
group_by(company) %>%
summarize(minDate = min(registrationDate), maxDate = max(registrationDate), num_users = n())
> df2
Source: local data frame [3 x 4]
company minDate maxDate num_users
(chr) (date) (date) (int)
1 Google 2015-01-25 2015-01-25 1
2 Intel 2015-01-04 2015-01-04 2
3 Nvidia 2015-01-19 2015-01-20 3
df2$result = difftime(df2$maxDate, df2$minDate, units = "days")/df2$num_users
> df2
Source: local data frame [3 x 5]
company minDate maxDate num_users result
(chr) (date) (date) (int) (dfft)
1 Google 2015-01-25 2015-01-25 1 0 days
2 Intel 2015-01-04 2015-01-04 2 0 days
3 Nvidia 2015-01-19 2015-01-20 3 0.3333333 days
答案 1 :(得分:0)
df %>% group_by(company) %>%
mutate(AvgTime = (max(registrationDate)-min(registrationDate))/length(company))
user company registrationDate AvgTime
1 Tia Intel 2015-01-04 0.0000000 days
2 Sam Intel 2015-01-04 0.0000000 days
3 Matt Nvidia 2015-01-19 0.3333333 days
4 Brandy Nvidia 2015-01-20 0.3333333 days
5 Joe Nvidia 2015-01-20 0.3333333 days
6 Nariko Google 2015-01-25 0.0000000 days