取重复条目得分的平均值并以宽格式转换

时间:2017-04-22 15:57:47

标签: r dataframe aggregate mean reshape

我想以宽格式重新整形数据,但我想取与第四列条目相关的第三列的平均值。像(0.21+0.05+0.06)/total一样。 我在R中读过关于reshape包的内容,但我不知道在转换为宽格式之前使用哪个聚合函数找到平均值

输入数据框

CID100000085    C0000737      0.21        Abdominal pain
CID100000085    C0000737      0.21        Gastrointestinal pain
CID100000085    C0000737      0.05        Abdominal pain
CID100000085    C0000737      0.05        Gastrointestinal pain
CID100000085    C0000737      0.06        Abdominal pain
CID100000085    C0000737      0.06        Gastrointestinal pain

期望输出

                                Abdominal pain   Gastrointestinal pain
   CID100000085    C0000737     0.0166           0.0166

2 个答案:

答案 0 :(得分:3)

我们可以使用dcast

library(data.table)
dcast(setDT(df1), id1+id2~pain, value.var = "value", mean)
#            id1      id2 Abdominal pain Gastrointestinal pain
#1: CID100000085 C0000737      0.1066667             0.1066667

数据

df1 <- structure(list(id1 = c("CID100000085", "CID100000085", "CID100000085", 
"CID100000085", "CID100000085", "CID100000085"), id2 = c("C0000737",  
 "C0000737", "C0000737", "C0000737", "C0000737", "C0000737"), 
value = c(0.21, 0.21, 0.05, 0.05, 0.06, 0.06), pain = c("Abdominal pain", 
"Gastrointestinal pain", "Abdominal pain", "Gastrointestinal pain", 
"Abdominal pain", "Gastrointestinal pain")), 
.Names = c("id1", 
"id2", "value", "pain"), class = "data.frame", row.names = c(NA, 
-6L))

答案 1 :(得分:2)

您可以在基础R中与aggregate一起尝试reshape

reshape(aggregate(V3~V1+V2+V4, df, mean), 
        idvar = "V1", timevar = "V4", direction = "wide")[,-4]

#            V1 V2.Abdominalpain V3.Abdominalpain V3.Gastrointestinalpain
#1 CID100000085         C0000737        0.1066667               0.1066667

数据

df <- structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "CID100000085", class = "factor"), 
    V2 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "C0000737", class = "factor"), 
    V3 = c(0.21, 0.21, 0.05, 0.05, 0.06, 0.06), V4 = structure(c(1L, 
    2L, 1L, 2L, 1L, 2L), .Label = c("Abdominalpain", "Gastrointestinalpain"
    ), class = "factor")), .Names = c("V1", "V2", "V3", "V4"), class = "data.frame", row.names = c(NA, 
-6L))