整理R dataframe dplyr中单行的值

时间:2016-03-09 22:22:12

标签: r dplyr

我有一些学生考试点数据:

     MAPPING PupilMatchingRefAnonymous POINTS 
1    PHYS        1                      60  
2    COMP        1                      40  
3    ENGL        1                      20  
4    MATH        1                      80

我希望将每个学生的数学和英语成绩添加到每个考试中,以便于比较:

  MAPPING PupilMatchingRefAnonymous POINTS  MATH    ENGL
1    PHYS        1                      60  80      20
2    COMP        1                      40  80      20
3    ENGL        1                      20  80      20
4    MATH        1                      80  80      20

我已尝试过以下代码,但没有运气:

comResults %>%
    select(MAPPING, PupilMatchingRefAnonymous, POINTS) %>%
    group_by(PupilMatchingRefAnonymous) %>% 
    mutate(MATH=ifelse(MAPPING=="MATH", POINTS, NA))

  Error: incompatible types, expecting a numeric vector

知道我应该尝试什么吗?

3 个答案:

答案 0 :(得分:3)

使用base,这看起来非常简单

df[as.character(df$MAPPING)] <- rep(df$POINTS, each = nrow(df))
df
#   MAPPING PupilMatchingRefAnonymous POINTS PHYS COMP ENGL MATH
# 1    PHYS                         1     60   60   40   20   80
# 2    COMP                         1     40   60   40   20   80
# 3    ENGL                         1     20   60   40   20   80
# 4    MATH                         1     80   60   40   20   80

答案 1 :(得分:2)

我不确定dplyr如何处理合并,但这个base-R解决方案会产生结果(更少的名称,修复应该相当简单:)

merge(merge(dat, dat[dat$MAPPING=="MATH", -1], by='PupilMatchingRefAnonymous'),
      dat[dat$MAPPING=="ENGL", -1] , by='PupilMatchingRefAnonymous')
#--------
  PupilMatchingRefAnonymous MAPPING POINTS.x POINTS.y POINTS
1                         1    PHYS       60       80     20
2                         1    COMP       40       80     20
3                         1    ENGL       20       80     20
4                         1    MATH       80       80     20

这是一个用于进一步测试的两个学生数据集:

 dput(dat)
structure(list(MAPPING = structure(c(4L, 1L, 2L, 3L, 4L, 1L, 
2L, 3L), .Label = c("COMP", "ENGL", "MATH", "PHYS"), class = "factor"), 
    PupilMatchingRefAnonymous = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
    2L), POINTS = c(60L, 40L, 20L, 80L, 20L, 40L, 0L, 80L)), .Names = c("MAPPING", 
"PupilMatchingRefAnonymous", "POINTS"), class = "data.frame", row.names = c(NA, 
-8L))

答案 2 :(得分:1)

我认为你试图将它从长格式转换为宽格式,对吗?

如果是这样,试试这个:

library(tidyr)
new.df <- comResults %>%
  spread(MAPPING, POINTS)

这将使1名学生成为一排,他们的所有学术信息都在同一行。我知道你只想要数学和英语,但也许这段代码可以让你走上正轨。