重新排列数据

时间:2017-10-06 15:55:00

标签: r dplyr tidyr tidyverse

我正在尝试重新排列数据集,然后对多个变量进行排序。例如,现在我有一些看起来像这样的东西:

ID   Name          Class 1         Class2       Monday 7-8         Monday 8-9  
1    Brad          Chem            Bio          Monday 7-8         NA
2    Charlene      Acct            NA           NA                 Monday 8-9
3    Carly         Philosophy      Physics      NA                 NA
4    Jess          Chem            Acct         Monday 7-8         Monday 8-9

按如下方式对数据进行排序:

Class               Monday 7-8           Monday 8-9
Acct                Jess                 Charlene, Jess
Bio                 Brad                 NA
Chem                Brad, Jess           Jess
Philosophy          NA                   NA
Physics             NA                   NA

我已经尝试将所有变量分成不同的电子表格,然后合并它们,但我无法弄清楚如何根据类和时间对名称进行排序,并且证明难以理解。实际的数据库由大约70个不同的时间选项组成,有80个不同的人和150个不同的类名(化学,生物等),所以我不能单独创建它

2 个答案:

答案 0 :(得分:0)

以下是此任务的一些基本R代码:

dat <- data.frame(
    name=c("Brad", "Charlene", "Carly", "Jess"),
    class1=c("Chem", "Acct", "Philosophy", "Chem"),
    class2=c("Bio", NA, "Physics", "Acct"),
    monday7.8=c("monday7.8", NA, NA, "monday7.8"),
    monday8.9=c(NA, "monday8.9", NA, "monday8.9"),
    stringsAsFactors=FALSE
)
classes <- c("Chem", "Acct", "Philosophy", "Physics")
times <- c("monday7.8", "monday8.9")
ret <- expand.grid(class=classes, time=times, stringsAsFactors=FALSE)
one_alloc <- function(cl, tm, dat) {
    idx <- which(!is.na(dat[,tm]) & (dat[,"class1"]==cl | dat[,"class2"]==cl))
    if(length(idx)>0) return(paste(dat[idx,"name"], collapse=", ")) else return(NA)
}
one_alloc <- Vectorize(one_alloc, vectorize.args=c("cl", "tm"))
ret[,"names"] <- one_alloc(cl=ret[,"class"], tm=ret[,"time"], dat=dat)
ret <- reshape(ret, timevar="time", idvar="class", direction="wide")
ret

答案 1 :(得分:0)

tidyr解决方案:

df1 %>%
  gather(class_col,Class,'Class.1','Class2') %>%
  filter(!is.na(Class)) %>%
  gather(date_col,date,'Monday.7.8','Monday.8.9') %>%
  group_by(Class,date) %>%
  summarize(Name = paste(Name,collapse=", ")) %>%
  spread(date,Name) %>%
  select(-`<NA>`)

# # A tibble: 5 x 3
# # Groups:   Class [5]
#          Class `Monday 7-8`   `Monday 8-9`
#   *      <chr>        <chr>          <chr>
#   1       Acct         Jess Charlene, Jess
#   2        Bio         Brad           <NA>
#   3       Chem   Brad, Jess           Jess
#   4 Philosophy         <NA>           <NA>
#   5    Physics         <NA>           <NA>