Question

我在R中有一个数据框（＆＃34; samp＆＃34;），其中包含学生ID和每个学生所做的考试。

    student_id math_exam spanish_exam
       <int>     <dbl>     <dbl>
1          1       0         1
2          2       1         0
3          3       0         0
4          4       1         1

我想制作一个名单，其中我有学生证和学生考试的名称，而不是0和1.因此，对于学生1，它只会显示西班牙语考试，但对于学生4它将显示数学考试，西班牙语考试。

我认为我接近使用replace命令，所以我做了一些基本测试，看看我是否可以用列名替换所有1：

    replace(samp, grepl(1, samp, perl=TRUE), names(samp)[2])

但我用相同的列名替换了所有内容：

   student_id math_exam spanish_exam
 1  math_exam math_exam    math_exam
 2  math_exam math_exam    math_exam
 3  math_exam math_exam    math_exam
 4  math_exam math_exam    math_exam

我试图像samp $ math_exam一样指定一个列但得到相同的结果。使用替换是个好主意吗？如果我问得太多，我对R仍然相当新，所以道歉。对此的任何指导都会很精彩！谢谢

Answer 1

这是一种切片data.frame的方法，将数据集融合为长格式并仅返回所采用的考试。

library(tidyr)

xy <- data.frame(student_id = 1:4, math_exam = c(0, 1, 0, 1), spanish_exam = c(1, 0, 0, 1))

xy <- split(xy, xy$student_id)

result <- lapply(xy, FUN = function(x) {
  out <- gather(x, key = exam, value = taken, -student_id)
  out[out$taken == 1, ][, -3]
})

do.call(rbind, result)

    student_id         exam
1            1 spanish_exam
2            2    math_exam
4.1          4    math_exam
4.2          4 spanish_exam

如果您喜欢dplyr解决方案......

library(dplyr)

xy %>%
  group_by(student_id) %>%
  gather(key = exam, value = taken, -student_id) %>%
  filter(taken == 1) %>%
  select(-taken)

Source: local data frame [4 x 2]
Groups: student_id [3]

  student_id         exam
       <int>        <chr>
1          2    math_exam
2          4    math_exam
3          1 spanish_exam
4          4 spanish_exam

Answer 2

我们可以使用melt包中的reshape2函数来融合value == 1的数据框和子集。在student_id分割得到的数据框，我们得到一个包含不同学生及其考试的列表，即

library(reshape2)
d3 <- melt(d1, id.vars = 'student_id')
d3 <- d3[d3$value == 1,][-3]
split(d3, d3$student_id)

#$`1`
#  student_id     variable
#5          1 spanish_exam

#$`2`
#  student_id  variable
#2          2 math_exam

#$`4`
#  student_id     variable
#4          4    math_exam
#8          4 spanish_exam

#You can also split on variable to get a list of exams rather than a list of students, i.e.

split(d3, d3$variable)

#$math_exam
#    student_id  variable
#  2          2 math_exam
#  4          4 math_exam

#$spanish_exam
#  student_id     variable
#5          1 spanish_exam
#8          4 spanish_exam

数据

dput(d1) structure(list(student_id = 1:4, math_exam = c(0L, 1L, 0L, 1L ), spanish_exam = c(1L, 0L, 0L, 1L)), .Names = c("student_id", "math_exam", "spanish_exam"), class = "data.frame", row.names = c("1", "2", "3", "4"))

如何将R中的数据帧更改为命名列表？

2 个答案: