从长格式到宽格式转置/重塑数据帧

时间:2020-06-03 11:43:24

标签: r transpose

我有一个遵循以下较长模式的数据框:

studentInfo <- data.frame(University=c("A","B","C","B","A","D"),StudentID = c("S1","S1","S2","S2","S3","S3"),Subject = c("Maths", "Science", "English", "Maths", "History", "English"))

studentInfo<-data.table(studentInfo,keep.rownames = "FALSE")



    University   StudentID     Subject
1   A            S1            Maths
2   B            S1            Science
3   C            S2            English
4   B            S2            Maths
5   A            S3            History
6   D            S3            English

dcast (studentInfo,StudentID ~ Subject, value.var = "Subject")

我得到以下信息:

 StudentID English History Maths Science
1:        S1    <NA>    <NA> Maths Science
2:        S2 English    <NA> Maths    <NA>
3:        S3 English History  <NA>    <NA>


我想获得以下信息:

    University  StudentID   S1     S3     S1      S2      S2      S3

1   A           S1          Maths                   
5   A           S3                 History              
2   B           S1                       Science            
4   B           S2                                Maths     
3   C           S2                                        English       
6   D           S3                                                English

我不熟悉R语言编码。我正在准备一个数据集以运行Heatmap / Oncoprint。我尝试使用reshape2和传播功能的dcast。但是无法获得下一步工作流程所需的格式。

谢谢

2 个答案:

答案 0 :(得分:1)

您可以创建带有行号的列,然后获取宽格式的数据。

library(dplyr)

studentInfo %>%
    mutate(row = row_number()) %>%
    group_by(StudentID) %>%
    mutate(StudentID = paste(StudentID, row_number(), sep = "_")) %>%
    tidyr::pivot_wider(names_from = StudentID, values_from = Subject) %>%
    select(-row)

# A tibble: 6 x 7
#  University S1_1  S1_2    S2_1    S2_2  S3_1    S3_2   
#  <chr>      <chr> <chr>   <chr>   <chr> <chr>   <chr>  
#1 A          Maths NA      NA      NA    NA      NA     
#2 B          NA    Science NA      NA    NA      NA     
#3 C          NA    NA      English NA    NA      NA     
#4 B          NA    NA      NA      Maths NA      NA     
#5 A          NA    NA      NA      NA    History NA     
#6 D          NA    NA      NA      NA    NA      English

不建议将数据框具有相同的列名。

答案 1 :(得分:0)

尝试一下:

dcast(studentInfo, University + StudentID ~ StudentID, value.var = 'Subject')