我有一个遵循以下较长模式的数据框:
studentInfo <- data.frame(University=c("A","B","C","B","A","D"),StudentID = c("S1","S1","S2","S2","S3","S3"),Subject = c("Maths", "Science", "English", "Maths", "History", "English"))
studentInfo<-data.table(studentInfo,keep.rownames = "FALSE")
University StudentID Subject
1 A S1 Maths
2 B S1 Science
3 C S2 English
4 B S2 Maths
5 A S3 History
6 D S3 English
dcast (studentInfo,StudentID ~ Subject, value.var = "Subject")
我得到以下信息:
StudentID English History Maths Science
1: S1 <NA> <NA> Maths Science
2: S2 English <NA> Maths <NA>
3: S3 English History <NA> <NA>
我想获得以下信息:
University StudentID S1 S3 S1 S2 S2 S3
1 A S1 Maths
5 A S3 History
2 B S1 Science
4 B S2 Maths
3 C S2 English
6 D S3 English
我不熟悉R语言编码。我正在准备一个数据集以运行Heatmap / Oncoprint。我尝试使用reshape2和传播功能的dcast。但是无法获得下一步工作流程所需的格式。
谢谢
答案 0 :(得分:1)
您可以创建带有行号的列,然后获取宽格式的数据。
library(dplyr)
studentInfo %>%
mutate(row = row_number()) %>%
group_by(StudentID) %>%
mutate(StudentID = paste(StudentID, row_number(), sep = "_")) %>%
tidyr::pivot_wider(names_from = StudentID, values_from = Subject) %>%
select(-row)
# A tibble: 6 x 7
# University S1_1 S1_2 S2_1 S2_2 S3_1 S3_2
# <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 A Maths NA NA NA NA NA
#2 B NA Science NA NA NA NA
#3 C NA NA English NA NA NA
#4 B NA NA NA Maths NA NA
#5 A NA NA NA NA History NA
#6 D NA NA NA NA NA English
不建议将数据框具有相同的列名。
答案 1 :(得分:0)
尝试一下:
dcast(studentInfo, University + StudentID ~ StudentID, value.var = 'Subject')