我正在使用R中的日期,我想将日期转换为一个数字,该数字表示参与者通过测试的尝试次数。一些参与者进行了多次尝试,而其他参与者则仅进行了一次尝试。此外,有些考试比其他考试早了数年,所以我不在乎日期,就算是第一时间还是第二时间等等。
这是一个模拟数据集:
library(dplyr)
library(lubridate)
problem <- tibble(name = c("Britney", "Christina", "Justin", "Britney", "Britney", "Christina", "Christina", "Christina"),
score = c(1, 2, 3, 3, 3, 2, 4, 2),
date = ymd_hms(c("2019-02-26 00:18:09", "2019-04-26 00:18:09", "2019-02-20 00:18:09", "2018-02-26 00:18:09", "2017-02-26 00:18:09", "2016-02-26 00:18:09", "2015-02-26 00:18:09", "2010-02-26 00:18:09")))
这是我最终想要的样子:
solution <- tibble(name = c("Britney", "Christina", "Justin", "Britney", "Britney", "Christina", "Christina", "Christina"),
score = c(1, 2, 3, 3, 3, 2, 4, 2),
date = ymd_hms(c("2019-02-26 00:18:09", "2019-04-26 00:18:09", "2019-02-20 00:18:09", "2018-02-26 00:18:09", "2017-02-26 00:18:09", "2016-02-26 00:18:09", "2015-02-26 00:18:09", "2010-02-26 00:18:09")),
order = c(3, 4, 1, 2, 1, 3, 2, 1))
solution
谢谢!
答案 0 :(得分:2)
您可以按名称分组并采用相反的顺序,即
library(dplyr)
problem %>%
group_by(name) %>%
mutate(order = rev(seq(n())))
给出,
# A tibble: 8 x 4 # Groups: name [3] name score date order <chr> <dbl> <dttm> <int> 1 Britney 1 2019-02-26 00:18:09 3 2 Christina 2 2019-04-26 00:18:09 4 3 Justin 3 2019-02-20 00:18:09 1 4 Britney 3 2018-02-26 00:18:09 2 5 Britney 3 2017-02-26 00:18:09 1 6 Christina 2 2016-02-26 00:18:09 3 7 Christina 4 2015-02-26 00:18:09 2 8 Christina 2 2010-02-26 00:18:09 1
答案 1 :(得分:1)
我们可以转换为factor
并强制转换为integer
library(dplyr)
problem %>%
group_by(name) %>%
mutate(n = as.integer(factor(date)))
# A tibble: 8 x 4
# Groups: name [3]
# name score date n
# <chr> <dbl> <dttm> <int>
#1 Britney 1 2019-02-26 00:18:09 3
#2 Christina 2 2019-04-26 00:18:09 4
#3 Justin 3 2019-02-20 00:18:09 1
#4 Britney 3 2018-02-26 00:18:09 2
#5 Britney 3 2017-02-26 00:18:09 1
#6 Christina 2 2016-02-26 00:18:09 3
#7 Christina 4 2015-02-26 00:18:09 2
#8 Christina 2 2010-02-26 00:18:09 1
或者在按“名称”分组之后,在“日期”上应用dense_rank
problem %>%
group_by(name) %>%
mutate(n = dense_rank(date))
# A tibble: 8 x 4
# Groups: name [3]
# name score date n
# <chr> <dbl> <dttm> <int>
#1 Britney 1 2019-02-26 00:18:09 3
#2 Christina 2 2019-04-26 00:18:09 4
#3 Justin 3 2019-02-20 00:18:09 1
#4 Britney 3 2018-02-26 00:18:09 2
#5 Britney 3 2017-02-26 00:18:09 1
#6 Christina 2 2016-02-26 00:18:09 3
#7 Christina 4 2015-02-26 00:18:09 2
#8 Christina 2 2010-02-26 00:18:09 1
注意:两种解决方案都基于查看'date'变量。没有其他假设
答案 2 :(得分:1)
或用group_by
和name
整理数据后,用row_number
name
分配date
library(dplyr)
problem %>%
arrange(name, date) %>%
group_by(name) %>%
mutate(order = row_number())
# A tibble: 8 x 4
# Groups: name [3]
# name score date order
# <chr> <dbl> <dttm> <int>
#1 Britney 3 2017-02-26 00:18:09 1
#2 Britney 3 2018-02-26 00:18:09 2
#3 Britney 1 2019-02-26 00:18:09 3
#4 Christina 2 2010-02-26 00:18:09 1
#5 Christina 4 2015-02-26 00:18:09 2
#6 Christina 2 2016-02-26 00:18:09 3
#7 Christina 2 2019-04-26 00:18:09 4
#8 Justin 3 2019-02-20 00:18:09 1
答案 3 :(得分:1)
您可以在data.table中使用rowid
library(data.table)
setDT(problem)
problem[order(date), order := rowid(name)]
或者您可以使用frank
按名称对日期进行排名
problem[, order := frank(date), name]
任一方法的输出
problem
# name score date order
# 1: Britney 1 2019-02-26 00:18:09 3
# 2: Christina 2 2019-04-26 00:18:09 4
# 3: Justin 3 2019-02-20 00:18:09 1
# 4: Britney 3 2018-02-26 00:18:09 2
# 5: Britney 3 2017-02-26 00:18:09 1
# 6: Christina 2 2016-02-26 00:18:09 3
# 7: Christina 4 2015-02-26 00:18:09 2
# 8: Christina 2 2010-02-26 00:18:09 1