我想以宽格式重新整形数据,但我想取与第四列条目相关的第三列的平均值。像(0.21+0.05+0.06)/total
一样。
我在R中读过关于reshape
包的内容,但我不知道在转换为宽格式之前使用哪个聚合函数找到平均值
输入数据框
CID100000085 C0000737 0.21 Abdominal pain
CID100000085 C0000737 0.21 Gastrointestinal pain
CID100000085 C0000737 0.05 Abdominal pain
CID100000085 C0000737 0.05 Gastrointestinal pain
CID100000085 C0000737 0.06 Abdominal pain
CID100000085 C0000737 0.06 Gastrointestinal pain
期望输出
Abdominal pain Gastrointestinal pain
CID100000085 C0000737 0.0166 0.0166
答案 0 :(得分:3)
我们可以使用dcast
library(data.table)
dcast(setDT(df1), id1+id2~pain, value.var = "value", mean)
# id1 id2 Abdominal pain Gastrointestinal pain
#1: CID100000085 C0000737 0.1066667 0.1066667
df1 <- structure(list(id1 = c("CID100000085", "CID100000085", "CID100000085",
"CID100000085", "CID100000085", "CID100000085"), id2 = c("C0000737",
"C0000737", "C0000737", "C0000737", "C0000737", "C0000737"),
value = c(0.21, 0.21, 0.05, 0.05, 0.06, 0.06), pain = c("Abdominal pain",
"Gastrointestinal pain", "Abdominal pain", "Gastrointestinal pain",
"Abdominal pain", "Gastrointestinal pain")),
.Names = c("id1",
"id2", "value", "pain"), class = "data.frame", row.names = c(NA,
-6L))
答案 1 :(得分:2)
您可以在基础R中与aggregate
一起尝试reshape
:
reshape(aggregate(V3~V1+V2+V4, df, mean),
idvar = "V1", timevar = "V4", direction = "wide")[,-4]
# V1 V2.Abdominalpain V3.Abdominalpain V3.Gastrointestinalpain
#1 CID100000085 C0000737 0.1066667 0.1066667
数据强>
df <- structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "CID100000085", class = "factor"),
V2 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "C0000737", class = "factor"),
V3 = c(0.21, 0.21, 0.05, 0.05, 0.06, 0.06), V4 = structure(c(1L,
2L, 1L, 2L, 1L, 2L), .Label = c("Abdominalpain", "Gastrointestinalpain"
), class = "factor")), .Names = c("V1", "V2", "V3", "V4"), class = "data.frame", row.names = c(NA,
-6L))