所以我有这个数据集,我一直在为其他人清理,但是他们希望按照观察类型将特定列分成几列。例如,这是一列诊断,她希望扩展此列,因此一列用于一次诊断,另一列用于不同的诊断。因此,我将抑郁症,多动症,哮喘,癌症等专栏扩展到一个名为抑郁症的专栏,一个名为ADHD等的专栏。
我很确定这违反了整洁数据的原则,但我这样做的人坚持认为这就是他们想要的方式。所以我试着查看tidyr和dplyr包,但到目前为止我没有运气,可以使用一些建议。
感谢您的提前帮助
Order Diagnosis
1 1 Synaesthesia
2 1 Synaesthesia
3 1 Synaesthesia
4 1 Synaesthesia
5 1 Synaesthesia
6 1 Synaesthesia
7 1 ADHD
8 1 ADHD
9 1 ADHD
10 1 ADHD
11 1 ADHD
12 1 ADHD
13 1 ADHD
14 1 ADHD
15 1 ADHD
16 1 ADHD
17 1 ADHD
18 1 ADHD
19 1 ADHD
20 1 ADHD
21 1 ADHD
22 1 ADHD
23 1 ADHD
24 1 ADHD
25 1 ADHD
26 1 ADHD
27 1 ADHD
28 1 ADHD
29 1 ADHD
30 1 ADHD
31 1 ADHD
32 1 ADHD
33 1 ADHD
34 1 ADHD
35 1 ADHD
36 1 ADHD
37 1 ADHD
答案 0 :(得分:1)
您的预期结果并不完全清楚,但有一种解释是您希望重新编码数据,例如:通过使用虚拟编码。
一种简单的方法是使用model.matrix()
。试试这个:
model.matrix(~ Diagnosis - 1, dat)
DiagnosisADHD DiagnosisSynaesthesia
1 0 1
2 0 1
3 0 1
4 0 1
5 0 1
6 0 1
7 1 0
8 1 0
9 1 0
10 1 0
...
答案 1 :(得分:0)
您可以拆分"向量" (或你的情况下的专栏),用NA填充它并将其压缩成完全承诺的数据框架或矩阵。
x <- sample(LETTERS[1:5], size = 100, replace = TRUE)
sx <- split(x, x)
ml <- max(unlist(lapply(sx, length)))
# pad the data with NAs
do.call("cbind", lapply(sx, FUN = function(m) c(m, rep(NA, ml - length(m)))))
A B C D E
[1,] "A" "B" "C" "D" "E"
[2,] "A" "B" "C" "D" "E"
[3,] "A" "B" "C" "D" "E"
[4,] "A" "B" "C" "D" "E"
[5,] "A" "B" "C" "D" "E"
[6,] "A" "B" "C" "D" "E"
[7,] "A" "B" "C" "D" "E"
[8,] "A" "B" "C" "D" "E"
[9,] "A" "B" "C" "D" "E"
[10,] "A" "B" "C" "D" "E"
[11,] "A" "B" "C" "D" "E"
[12,] "A" "B" "C" "D" "E"
[13,] "A" "B" "C" "D" "E"
[14,] "A" "B" "C" "D" "E"
[15,] NA "B" "C" "D" "E"
[16,] NA "B" "C" "D" "E"
[17,] NA "B" "C" "D" "E"
[18,] NA "B" "C" "D" "E"
[19,] NA "B" "C" "D" "E"
[20,] NA "B" "C" "D" "E"
[21,] NA "B" "C" "D" NA
[22,] NA NA "C" "D" NA
[23,] NA NA NA "D" NA