Question

我问了一些非常相似的问题here，但是现在我对我的问题有了更好的了解。我将尽我所能问清楚。

我有一个示例数据集，如下所示：

    id <-       c(1,1,1, 2,2,2, 3,3, 4,4, 5,5,5,5, 6,6,6, 7, 8,8, 9,9, 10,10)
item.id <-  c(1,1,2, 1,1,1 ,1,1, 1,2, 1,2,2,2, 1,1,1, 1, 1,2, 1,1, 1,1)
sequence <- c(1,2,1, 1,2,3, 1,2, 1,1, 1,1,2,3, 1,2,3, 1, 1,1, 1,2, 1,2)
score <-    c(0,0,0, 0,0,1, 2,0, 1,1, 1,0,1,1, 0,0,0, 1, 0,2, 1,2, 2,1)

data <- data.frame("id"=id, "item.id"=item.id, "sequence"=sequence, "score"=score)
> data
   id item.id sequence score
1   1       1        1     0
2   1       1        2     0
3   1       2        1     0
4   2       1        1     0
5   2       1        2     0
6   2       1        3     1
7   3       1        1     2
8   3       1        2     0
9   4       1        1     1
10  4       2        1     1
11  5       1        1     1
12  5       2        1     0
13  5       2        2     1
14  5       2        3     1
15  6       1        1     0
16  6       1        2     0
17  6       1        3     0
18  7       1        1     1
19  8       1        1     0
20  8       2        1     2
21  9       1        1     1
22  9       1        2     2
23 10       1        1     2
24 10       1        2     1

id代表每个学生，item.id代表学生所提出的问题，sequence是每个item.id的尝试次数，而score是分数每次尝试，取0,1或2。学生可以更改答案。

对于每个item.id中的id，我想通过查看最后两个序列（更改）来创建变量（status）：

a) assign "WW" for those who changed from wrong to wrong (0 to 0),
b) assign "WR" for those who changed to increasing score (0 to 1, or 1 to 2),
c) assign "RW" for those who changed to decreasing score (2 to 1, 2 to 0, or 1 to 0 ), and
d) assign "RR" for those who changed from right to right (1 to 1, 2 to 2).

从0到1或0到2或1到2的分数变化被认为是正确的（正确）变化，分数从1变为0或2变为0或2变为1被认为是不正确（错误）的变化。

如果item.id = id中只有一次尝试7，那么status应该是"one.right"。如果score是0，则它应该是"one.wrong"。同时，score是right或1时被认为是2，score是0时被认为是错误的。

期望的输出将与案例：

 > desired
     id item.id    status
  1   1       1        WW
  2   1       2 one.wrong
  3   2       1        WR
  4   3       1        RW
  5   4       1 one.right
  6   4       2 one.right
  7   5       1 one.right
  8   5       2        RR
  9   6       1        WW
  10  7       1 one.right
  11  8       1 one.wrong
  12  8       2 one.right
  13  9       1        WR
  14  10      1        RW

问题的先前版本之间的主要区别是我没有考虑更改

a) from 1 to 2 as WR, instead, they were coded as RR,
b) from 2 to 1 as RW, instead, they were coded as WW.

同样的逻辑是，如果分数增加，则应为WR，如果分数降低，则应为RW。

我收到的最佳答案是

library(dplyr)
library(purrr)
library(forcats)

data %>% 
  mutate(status = ifelse(score > 0, "R", "W")) %>% 
  group_by(id, item.id) %>% 
  filter(sequence == n() - 1 | sequence == n()) %>%  
  summarise(status = paste(status, collapse = "")) %>% 
  ungroup() %>% 
  mutate(status = fct_recode(status, "one.wrong" = "W", "one.right" = "R"))

但是我需要处理分数模式的增减。

有什么意见吗？谢谢！

Answer 1

这是每行的分类：

library(dplyr)
data = data %>%
  group_by(id, item.id) %>%
  mutate(diff = c(0, diff(score)),
         status = case_when(
           n() == 1 & score == 0 ~ "one.wrong",
           n() == 1 & score > 0 ~ "one.right",
           diff == 0 & score == 0 ~ "WW",
           diff == 0 & score > 0 ~ "RR",
           diff > 0 ~ "WR",
           diff < 0 ~ "RW",
           TRUE ~ "oops"
         ))
print.data.frame(data)
#    id item.id sequence score diff    status
# 1   1       1        1     0    0        WW
# 2   1       1        2     0    0        WW
# 3   1       2        1     0    0 one.wrong
# 4   2       1        1     0    0        WW
# 5   2       1        2     0    0        WW
# 6   2       1        3     1    1        WR
# 7   3       1        1     2    0        RR
# 8   3       1        2     0   -2        RW
# 9   4       1        1     1    0 one.right
# 10  4       2        1     1    0 one.right
# 11  5       1        1     1    0 one.right
# 12  5       2        1     0    0        WW
# 13  5       2        2     1    1        WR
# 14  5       2        3     1    0        RR
# 15  6       1        1     0    0        WW
# 16  6       1        2     0    0        WW
# 17  6       1        3     0    0        WW
# 18  7       1        1     1    0 one.right
# 19  8       1        1     0    0 one.wrong
# 20  8       2        1     2    0 one.right
# 21  9       1        1     1    0        RR
# 22  9       1        2     2    1        WR
# 23 10       1        1     2    0        RR
# 24 10       1        2     1   -1        RW

然后我们可以总结一下，取最后一个status值：

summarize(data, status = last(status))
# # A tibble: 14 x 3
# # Groups:   id [10]
#       id item.id status   
#    <dbl>   <dbl> <chr>    
#  1     1       1 WW       
#  2     1       2 one.wrong
#  3     2       1 WR       
#  4     3       1 RW       
#  5     4       1 one.right
#  6     4       2 one.right
#  7     5       1 one.right
#  8     5       2 RR       
#  9     6       1 WW       
# 10     7       1 one.right
# 11     8       1 one.wrong
# 12     8       2 one.right       
# 13     9       1 WR       
# 14    10       1 RW

这似乎与您所需的输出匹配。

R

1 个答案: