Question

我正在使用R中的日期，我想将日期转换为一个数字，该数字表示参与者通过测试的尝试次数。一些参与者进行了多次尝试，而其他参与者则仅进行了一次尝试。此外，有些考试比其他考试早了数年，所以我不在乎日期，就算是第一时间还是第二时间等等。

这是一个模拟数据集：

library(dplyr)
library(lubridate)
problem <- tibble(name = c("Britney", "Christina", "Justin", "Britney", "Britney", "Christina", "Christina", "Christina"),
                  score = c(1, 2, 3, 3, 3, 2, 4, 2),
                  date = ymd_hms(c("2019-02-26 00:18:09", "2019-04-26 00:18:09", "2019-02-20 00:18:09", "2018-02-26 00:18:09", "2017-02-26 00:18:09", "2016-02-26 00:18:09", "2015-02-26 00:18:09", "2010-02-26 00:18:09")))

这是我最终想要的样子：

solution <- tibble(name = c("Britney", "Christina", "Justin", "Britney", "Britney", "Christina", "Christina", "Christina"),
                  score = c(1, 2, 3, 3, 3, 2, 4, 2),
                  date = ymd_hms(c("2019-02-26 00:18:09", "2019-04-26 00:18:09", "2019-02-20 00:18:09", "2018-02-26 00:18:09", "2017-02-26 00:18:09", "2016-02-26 00:18:09", "2015-02-26 00:18:09", "2010-02-26 00:18:09")),
                  order = c(3, 4, 1, 2, 1, 3, 2, 1))

solution

谢谢！

Answer 1

您可以按名称分组并采用相反的顺序，即

library(dplyr)

problem %>% 
 group_by(name) %>% 
 mutate(order = rev(seq(n())))

给出，

# A tibble: 8 x 4
# Groups:   name [3]
  name      score date                order
  <chr>     <dbl> <dttm>              <int>
1 Britney       1 2019-02-26 00:18:09     3
2 Christina     2 2019-04-26 00:18:09     4
3 Justin        3 2019-02-20 00:18:09     1
4 Britney       3 2018-02-26 00:18:09     2
5 Britney       3 2017-02-26 00:18:09     1
6 Christina     2 2016-02-26 00:18:09     3
7 Christina     4 2015-02-26 00:18:09     2
8 Christina     2 2010-02-26 00:18:09     1

Answer 2

我们可以转换为factor并强制转换为integer

library(dplyr)
problem %>% 
    group_by(name) %>% 
    mutate(n = as.integer(factor(date)))
# A tibble: 8 x 4
# Groups:   name [3]
#  name      score date                    n
#  <chr>     <dbl> <dttm>              <int>
#1 Britney       1 2019-02-26 00:18:09     3
#2 Christina     2 2019-04-26 00:18:09     4
#3 Justin        3 2019-02-20 00:18:09     1
#4 Britney       3 2018-02-26 00:18:09     2
#5 Britney       3 2017-02-26 00:18:09     1
#6 Christina     2 2016-02-26 00:18:09     3
#7 Christina     4 2015-02-26 00:18:09     2
#8 Christina     2 2010-02-26 00:18:09     1

或者在按“名称”分组之后，在“日期”上应用dense_rank

problem %>% 
    group_by(name) %>%
    mutate(n = dense_rank(date))
# A tibble: 8 x 4
# Groups:   name [3]
#  name      score date                    n
#  <chr>     <dbl> <dttm>              <int>
#1 Britney       1 2019-02-26 00:18:09     3
#2 Christina     2 2019-04-26 00:18:09     4
#3 Justin        3 2019-02-20 00:18:09     1
#4 Britney       3 2018-02-26 00:18:09     2
#5 Britney       3 2017-02-26 00:18:09     1
#6 Christina     2 2016-02-26 00:18:09     3
#7 Christina     4 2015-02-26 00:18:09     2
#8 Christina     2 2010-02-26 00:18:09     1

注意：两种解决方案都基于查看'date'变量。没有其他假设

Answer 3

或用group_by和name整理数据后，用row_number name分配date

library(dplyr)

problem %>%
  arrange(name, date) %>%
  group_by(name) %>%
  mutate(order = row_number())


# A tibble: 8 x 4
# Groups:   name [3]
#   name      score date                order
#   <chr>     <dbl> <dttm>              <int>
#1 Britney       3 2017-02-26 00:18:09     1
#2 Britney       3 2018-02-26 00:18:09     2
#3 Britney       1 2019-02-26 00:18:09     3
#4 Christina     2 2010-02-26 00:18:09     1
#5 Christina     4 2015-02-26 00:18:09     2
#6 Christina     2 2016-02-26 00:18:09     3
#7 Christina     2 2019-04-26 00:18:09     4
#8 Justin        3 2019-02-20 00:18:09     1

Answer 4

您可以在data.table中使用rowid

library(data.table)
setDT(problem)

problem[order(date), order := rowid(name)]

或者您可以使用frank按名称对日期进行排名

problem[, order := frank(date), name]

任一方法的输出

problem
#         name score                date order
# 1:   Britney     1 2019-02-26 00:18:09     3
# 2: Christina     2 2019-04-26 00:18:09     4
# 3:    Justin     3 2019-02-20 00:18:09     1
# 4:   Britney     3 2018-02-26 00:18:09     2
# 5:   Britney     3 2017-02-26 00:18:09     1
# 6: Christina     2 2016-02-26 00:18:09     3
# 7: Christina     4 2015-02-26 00:18:09     2
# 8: Christina     2 2010-02-26 00:18:09     1

将日期设为序数变量

4 个答案: