根据另一列中的因子重塑数据框中的数字变量

时间:2016-07-08 14:17:41

标签: r

我一直在试验很多R代码,例如melt()split()reshape(),但没有一个能够真正解决我的问题。我想基于下面的那个创建一个新的df,它不再有Long_groupB_lymph_count,而是有三个不同的列,即" Aubagio_0" ," Aubagio_1"和" Aubagio_2"其中包含特定B_lymph_counts =患者ID的相应MS.number的值。

当前df:

MS.number  B_Lymph_count Long_group 
13  "MS072/1"  " 57014"      "Aubagio_0"
14  "MS072/1"  "116730"      "Aubagio_1"
46  "MS1246/1" "117843"      "Aubagio_0"
47  "MS1246/1" "209583"      "Aubagio_1"
52  "MS1253/1" " 71434"      "Aubagio_0"
53  "MS1253/1" "130382"      "Aubagio_1"
100 "MS717/1"  " 63916"      "Aubagio_0"
101 "MS717/1"  " 62434"      "Aubagio_1"
102 "MS717/1"  " 43533"      "Aubagio_2"

如果我想制作

MS.number  Aubagio_0 Aubagio_1 Aubagio_2
MS717/1       63916        62434     43533
MS1253/1      71434        130382     NA
...

希望这在R中是可能的。 非常感谢你的回复!

1 个答案:

答案 0 :(得分:1)

你可以尝试

library(reshape2)
dcast(data = d, MS.number ~ Long_group, value.var = "B_Lymph_count", fill=0)

             Aubagio_0 Aubagio_1 Aubagio_2
MS072/1      57014    116730         0
MS1246/1    117843    209583         0
MS1253/1     71434    130382         0
MS717/1      63916     62434     43533

fill参数指定空单元格中的值。例如,您也可以将其设置为NA(默认情况下)。

数据

d <- structure(list(MS.number = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 
4L, 4L, 4L), .Label = c("MS072/1", "MS1246/1", "MS1253/1", "MS717/1"
), class = "factor"), B_Lymph_count = c(57014L, 116730L, 117843L, 
209583L, 71434L, 130382L, 63916L, 62434L, 43533L), Long_group = structure(c(1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 3L), .Label = c("Aubagio_0", "Aubagio_1", 
"Aubagio_2"), class = "factor")), .Names = c("MS.number", "B_Lymph_count", 
"Long_group"), class = "data.frame", row.names = c("13", "14", 
"46", "47", "52", "53", "100", "101", "102"))