我正在尝试在现有数据集中添加一列。 数据集包含三列:
Student
(具有参与者ID的列),Week
(收集数据的一年中的第几周),
和Day
(数据处理的工作日数
收集)。现在,我要创建的新列Obs
将包含一个递增的数字(从1到n),该数字表示每个学生接受测试的那一周。
我尝试将group_by
与rep
结合使用,但似乎没有产生我想要的结果:
Week <- c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4)
Day <- c(1, 2, 3, 2, 3, 5, 1, 3, 2, 3, 4, 5)
Student <- c("A", "A", "A", "B", "B", "B", "B", "B", "C", "C", "C", "C")
fake.db <- data.frame(Student, Week, Day)
library(dplyr)
fake.db %>%
group_by(Student) %>%
mutate(Obs = rep(1:length(Student), each = Week))
# Student Week Day Obs
# <fct> <dbl> <dbl> <int>
# 1 A 1 1 1
# 2 A 1 2 2
# 3 A 1 3 3
# 4 B 2 2 1
# 5 B 2 3 2
# 6 B 2 5 3
# 7 B 3 1 4
# 8 B 3 3 5
# 9 C 4 2 1
#10 C 4 3 2
#11 C 4 4 3
#12 C 4 5 4
我想要获得的是不同的。对于数据收集的第一周,应报告1
,对于第二周收集数据的学生,应报告2
,依此类推:
# Student Week Day Obs
#1 A 1 1 1
#2 A 1 2 1
#3 A 1 3 1
#4 B 2 2 1
#5 B 2 3 1
#6 B 2 5 1
#7 B 3 1 2
#8 B 3 3 2
#9 C 4 2 1
#10 C 4 3 1
#11 C 4 4 1
#12 C 4 5 1
答案 0 :(得分:4)
一种dplyr
可能是:
fake.db %>%
group_by(Student) %>%
mutate(Obs = cumsum(!duplicated(Week)))
Student Week Day Obs
<fct> <dbl> <dbl> <int>
1 A 1 1 1
2 A 1 2 1
3 A 1 3 1
4 B 2 2 1
5 B 2 3 1
6 B 2 5 1
7 B 3 1 2
8 B 3 3 2
9 C 4 2 1
10 C 4 3 1
11 C 4 4 1
12 C 4 5 1
它按“学生”列分组,并计算非重复的“周”值的累积总和。
或者:
fake.db %>%
group_by(Student) %>%
mutate(Obs = with(rle(Week), rep(seq_along(lengths), lengths)))
它按“学生”列分组,并在“周”列周围创建游程类型组ID。
或者:
fake.db %>%
group_by(Student) %>%
mutate(Obs = dense_rank(Week))
它按“学生”列分组,并在“周”列中对值进行排名。
答案 1 :(得分:2)
我理解的问题是,您要计算每个学生自第一个测试周以来的周数。即第二周是学生B的第一周考试,因此它的考试期限为Obs = 1
。这意味着您可以进行分组突变:
library(dplyr)
fake.db <- structure(list(Student = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), Week = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4), Day = c(1, 2, 3, 2, 3, 5, 1, 3, 2, 3, 4, 5)), class = "data.frame", row.names = c(NA, -12L))
fake.db %>%
group_by(Student) %>%
mutate(Obs = Week - min(Week) + 1)
#> # A tibble: 12 x 4
#> # Groups: Student [3]
#> Student Week Day Obs
#> <fct> <dbl> <dbl> <dbl>
#> 1 A 1 1 1
#> 2 A 1 2 1
#> 3 A 1 3 1
#> 4 B 2 2 1
#> 5 B 2 3 1
#> 6 B 2 5 1
#> 7 B 3 1 2
#> 8 B 3 3 2
#> 9 C 4 2 1
#> 10 C 4 3 1
#> 11 C 4 4 1
#> 12 C 4 5 1
由reprex package(v0.2.1)于2019-05-10创建
答案 2 :(得分:2)
使用by
unlist(by(fake.db, fake.db[, 1], function(x) as.numeric(factor(x[, 2]))))
# A1 A2 A3 B1 B2 B3 B4 B5 C1 C2 C3 C4
# 1 1 1 1 1 1 2 2 1 1 1 1
数据
fake.db <- structure(list(Student = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"),
Week = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4), Day = c(1,
2, 3, 2, 3, 5, 1, 3, 2, 3, 4, 5)), class = "data.frame", row.names = c(NA,
-12L))
答案 3 :(得分:1)
您可以查看差异是否为非零
@GET("persons/{personId}")
fun getPerson(@Path("personId") id: Int): Observable<Person>
或者如果它们的值不是数字,则可以与滞后值进行比较
fake.db %>%
group_by(Student) %>%
arrange(Week) %>%
mutate(Obs = cumsum(c(1, diff(Week)!=0)))