我的data.table
看起来像这样:
ID age gender relationship ACESscore PAPre PAPost NAPre NAPost PADelta NADelta
3 6192 32 2 2 2 8 10 NA 3 2 NA
4 6191 31 1 1 0 8 10 4 2 2 -2
6 8421 25 1 2 0 9 9 3 5 0 2
7 9991 18 1 NA 10 7 9 2 3 2 1
8 9992 18 2 NA 5 8 8 4 2 0 -2
9 7612 35 2 1 1 4 7 5 3 3 -2
我想制作PA / Pre-Post和NA / Pre-Post的折线图,我认为最好的方法(如果我错了,请纠正我)是获得一个新的表格喜欢:
ID age gender relationship ACESscore PA NA PREPOST
3 6192 32 2 2 2 10 1
4 6191 31 1 1 0 10 1
6 8421 25 1 2 0 9 1
7 9991 18 1 NA 10 9 1
8 9992 18 2 NA 5 8 1
9 7612 35 2 1 1 7 1
10 6192 32 2 2 8 NA 2
11 6191 31 1 1 8 4 2
12 8421 25 1 2 9 3 2
13 9991 18 1 NA 7 2 2
14 9992 18 2 NA 8 4 2
15 7612 35 2 1 4 5 2
如何制作它以便现在有两行ID和PAPre与PAPost堆叠,两个NAPre / Post相同?
答案 0 :(得分:2)
您可以通过melt
重新整形来完成此操作。
melt(dat[, -c("PADelta", "NADelta")],
measure.vars=list(c("PAPre", "PAPost"), c("NAPre", "NAPost")),
value.name=c("PAVal", "NAVal"), variable.name="prepost")
dat[, -c("PADelta", "NADelta")]
删除delta变量。要折叠的变量放在measure.vars参数的列表中。最后两个参数为新创建的变量提供名称。
返回
ID age gender relationship ACESscore prepost PAVal NAVal
1: 6192 32 2 2 2 1 8 NA
2: 6191 31 1 1 0 1 8 4
3: 8421 25 1 2 0 1 9 3
4: 9991 18 1 NA 10 1 7 2
5: 9992 18 2 NA 5 1 8 4
6: 7612 35 2 1 1 1 4 5
7: 6192 32 2 2 2 2 10 3
8: 6191 31 1 1 0 2 10 2
9: 8421 25 1 2 0 2 9 5
10: 9991 18 1 NA 10 2 9 3
11: 9992 18 2 NA 5 2 8 2
12: 7612 35 2 1 1 2 7 3
注意:初始帖子使用dat[, .SD, .SDcols=-c("PADelta", "NADelta")]
来对变量进行子集化。在评论中,弗兰克警告我,dat[, -c("PADelta", "NADelta")]
可以更简洁地完成这一点。
Frank还指出,data.table patterns
函数可用于查找与某些模式匹配的变量名称,以匹配要折叠的变量名称。这是一个使用此函数的更简洁和可扩展(想象超过2个句点)的方法。
melt(dat[, -c("PADelta", "NADelta")],
measure.vars=patterns("^PA", "^NA"),
value.name=c("PAVal", "NAVal"), variable.name="prepost")
数据强>
dat <-
structure(list(ID = c(6192L, 6191L, 8421L, 9991L, 9992L, 7612L
), age = c(32L, 31L, 25L, 18L, 18L, 35L), gender = c(2L, 1L,
1L, 1L, 2L, 2L), relationship = c(2L, 1L, 2L, NA, NA, 1L), ACESscore = c(2L,
0L, 0L, 10L, 5L, 1L), PAPre = c(8L, 8L, 9L, 7L, 8L, 4L), PAPost = c(10L,
10L, 9L, 9L, 8L, 7L), NAPre = c(NA, 4L, 3L, 2L, 4L, 5L), NAPost = c(3L,
2L, 5L, 3L, 2L, 3L), PADelta = c(2L, 2L, 0L, 2L, 0L, 3L), NADelta = c(NA,
-2L, 2L, 1L, -2L, -2L)), .Names = c("ID", "age", "gender", "relationship",
"ACESscore", "PAPre", "PAPost", "NAPre", "NAPost", "PADelta",
"NADelta"), row.names = c(NA, -6L), class = c("data.table", "data.frame"))