我需要转换以下(简化)数据集,由以下代码创建:
structure(list(W1.1 = structure(c(1L, NA, NA), .Names = c("case1",
"case2", "case3"), .Label = "1", class = "factor"), R1.1 = structure(c(1L,
NA, NA), .Names = c("case1", "case2", "case3"), .Label = "2", class = "factor"),
W1.2 = structure(c(NA, 1L, NA), .Names = c("case1", "case2",
"case3"), .Label = "1", class = "factor"), R1.2 = structure(c(NA,
1L, NA), .Names = c("case1", "case2", "case3"), .Label = "1", class = "factor"),
W2.1 = structure(c(NA, 1L, NA), .Names = c("case1", "case2",
"case3"), .Label = "1", class = "factor"), R2.1 = structure(c(NA,
1L, NA), .Names = c("case1", "case2", "case3"), .Label = "1", class = "factor"),
W2.2 = structure(c(1L, NA, NA), .Names = c("case1", "case2",
"case3"), .Label = "2", class = "factor"), R2.2 = structure(c(1L,
NA, NA), .Names = c("case1", "case2", "case3"), .Label = "1", class = "factor"),
W3.1 = structure(c(1L, NA, NA), .Names = c("case1", "case2",
"case3"), .Label = "1", class = "factor"), R3.1 = structure(c(1L,
NA, NA), .Names = c("case1", "case2", "case3"), .Label = "1", class = "factor"),
W3.2 = structure(c(1L, 1L, NA), .Names = c("case1", "case2",
"case3"), .Label = "1", class = "factor"), R3.2 = structure(c(1L,
1L, NA), .Names = c("case1", "case2", "case3"), .Label = "1", class = "factor"),
age = structure(c(3L, 1L, 2L), .Names = c("case1", "case2",
"case3"), .Label = c("20", "48", "56"), class = "factor"),
gender = structure(c(2L, 1L, 2L), .Names = c("case1", "case2",
"case3"), .Label = c("female", "male"), class = "factor")), .Names = c("W1.1",
"R1.1", "W1.2", "R1.2", "W2.1", "R2.1", "W2.2", "R2.2", "W3.1",
"R3.1", "W3.2", "R3.2", "age", "gender"), row.names = c(NA, 3L
), class = "data.frame")
对于我想要的新数据: - 专用于每个x.x的行,包含有关Rx.x值,年龄和性别的信息。 - 当Wx.x为1时,只返回一行。当2或NA时,我不需要它。
对于我的示例,此数据集应如下所示:
incident type Where Reported age gender
1 1 1.1 1 2 56 male
2 2 3.1 1 1 56 male
3 3 3.2 1 1 56 male
4 4 1.2 1 1 20 female
5 5 2.1 1 1 20 female
6 6 3.2 1 1 20 female
注意:“Where”列甚至可以省略,因为它应该是1的常量向量,我不需要它进行分析。
答案 0 :(得分:5)
{主要}这是reshape()
要解决的问题。假设您的原始数据集名为“temp”:
首先,将其从宽格式重新整理为长格式。
temp.long <- reshape(temp, direction = "long",
idvar=c("age", "gender"),
varying = which(!names(temp) %in% c("age", "gender")),
sep = "")
temp.long
# age gender time W R
# 56.male.1.1 56 male 1.1 1 2
# 20.female.1.1 20 female 1.1 <NA> <NA>
# 48.male.1.1 48 male 1.1 <NA> <NA>
# 56.male.1.2 56 male 1.2 <NA> <NA>
# 20.female.1.2 20 female 1.2 1 1
# 48.male.1.2 48 male 1.2 <NA> <NA>
# 56.male.2.1 56 male 2.1 <NA> <NA>
# 20.female.2.1 20 female 2.1 1 1
# 48.male.2.1 48 male 2.1 <NA> <NA>
# 56.male.2.2 56 male 2.2 2 1
# 20.female.2.2 20 female 2.2 <NA> <NA>
# 48.male.2.2 48 male 2.2 <NA> <NA>
# 56.male.3.1 56 male 3.1 1 1
# 20.female.3.1 20 female 3.1 <NA> <NA>
# 48.male.3.1 48 male 3.1 <NA> <NA>
# 56.male.3.2 56 male 3.2 1 1
# 20.female.3.2 20 female 3.2 1 1
# 48.male.3.2 48 male 3.2 <NA> <NA>
其次,做一些清理。
temp.long <- na.omit(temp.long)
temp.long <- temp.long[-which(temp.long$W == 2), ]
temp.long <- temp.long[order(rev(temp.long$gender), temp.long$time), ]
rownames(temp.long) <- NULL
temp.long$incident <- seq(nrow(temp.long))
temp.long
# age gender time W R incident
# 1 56 male 1.1 1 2 1
# 2 56 male 3.1 1 1 2
# 3 56 male 3.2 1 1 3
# 4 20 female 1.2 1 1 4
# 5 20 female 2.1 1 1 5
# 6 20 female 3.2 1 1 6
如果重要的话,您可以进行进一步清理以更改列名和列顺序。