我正在使用R,并且有一个玩某些在线游戏的人的数据表。
userId, login, country
132, 2017-01-01, A
133, 2017-01-01, B
133, 2018-01-01, B
432, 2018-01-01, A
我想查找每个国家/地区在2018年的新用户数量,定义为2018年而非2017年登录的用户数量。例如,如果上述数据表是整个数据表,则国家/地区A在2018年将有1个新用户(用户432),而国家B将有0个新用户(因为用户133在2017年登录)。
最快的方法是什么?
答案 0 :(得分:3)
如果数据集很大,使用data.table
可能是最快的
library(data.table)
setDT(data)
data[, login := as.Date(login)]
data[, .(year = min(year(login)), country), by = userId
][, sum(year == 2018), by = country]
country V1
1: A 1
2: B 0
数据在哪里:
data <- fread("userId, login, country
132, 2017-01-01, A
133, 2017-01-01, B
133, 2018-01-01, B
432, 2018-01-01, A")
编辑:在dplyr中使用类似的逻辑(结果更加冗长):
data %>%
mutate(year = year(as.Date(login))) %>%
group_by(userId) %>%
summarise(myear = min(year), country = unique(country)) %>%
group_by(country) %>%
summarise(n_new_users = sum(myear == 2018))
country n_new_users
<chr> <int>
1 A 1
2 B 0
Edit2:在基本R中使用类似的逻辑(也许不是最好的)(有些管道使跟踪更容易):
data$year <- as.integer(substr(data$login, 1, 4))
data %>%
aggregate(year ~ userId + country, ., min) %>%
aggregate(year ~ country, ., function(x) sum(x == 2018))
country year
1 A 1
2 B 0
答案 1 :(得分:1)
这是我的选择:
require(dplyr)
require(lubridate)
data %>%
mutate(years = year(as.Date(login))) %>%
group_by(userId) %>%
mutate(n = n()) %>% # n will be >1 if a user is not new
filter(n == 1, years == "2018") %>% # filter for n == 1 and year 2018
group_by(country) %>%
count()
答案 2 :(得分:0)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
text <-
"userId, login, country
132, 2017-01-01, A
133, 2017-01-01, B
133, 2018-01-01, B
432, 2018-01-01, A"
df <- read.csv(text = text, stringsAsFactors = F) %>%
mutate(yr = as.numeric(gsub("-.*", "", login)))
svnt_peeps <- df %>% filter(yr == 2017)
df %>%
filter(yr == 2018) %>%
anti_join(svnt_peeps, "userId") %>%
group_by(country) %>%
count()
#> # A tibble: 1 x 2
#> # Groups: country [1]
#> country n
#> <chr> <int>
#> 1 " A" 1