是否可以将R中的数据集聚合或汇总到中位数?

时间:2016-11-21 10:08:07

标签: r dplyr aggregate median

我正在尝试将R中的数据集聚合为中位数。

d <- aggregate(c(d$user_reported_percent, d$machine_percent), 
                       by = list(d$day), FUN=median, simplify = TRUE, drop = TRUE)

但是R一直在抱怨,我不确定与中位数汇总是否合理。

R给我的一些错误: aggregate.data.frame(as.data.frame(x),...)出错:   参数必须具有相同的长度

然后我尝试使用mutate来至少找到中位数

d <- d %>% group_by(day) %>% mutate(median=median(user_reported_percent))

错误是: 错误:无效的下标类型&#39;整数&#39;

我将不胜感激任何帮助! 非常感谢!

P.S的意思是一切都很完美

我的数据集如下所示:

structure(list(esmFollValue = c(36.00852, 8.688648, 0.6372048, 
13.7394, 0.7599012, 16.43628, 7.569684, 0.4502016, 0.7630464, 
0.781386, 0.5116056, 0.858756, 18.06108, 0.5473332, 14.62944, 
14.62944, 14.07216, 0.5366868, 14.12892, 0.7354944), esmHappValue = c(100L, 
80L, 80L, 80L, 60L, 80L, 60L, 60L, 80L, 60L, 100L, 60L, 80L, 
60L, 60L, 60L, 60L, 60L, 80L, 60L), deviceId = structure(c(11L, 
11L, 11L, 6L, 3L, 15L, 3L, 3L, 15L, 3L, 15L, 15L, 15L, 15L, 3L, 
3L, 15L, 3L, 9L, 9L), .Label = c("1e6c1183-af64-4860-b2d6-533cab7afe6c", 
"34209e3d-1a82-4f75-95c8-846be8a1be03", "7066f4af-82f3-4369-8f45-70d1ea3d22f2", 
"7cf78328-60c5-4564-9dd0-309cb0b3d5ad", "95b11f22-91e8-46d0-88d9-4f197267aa29", 
"a0c89d2a-d22d-41d0-a070-b9887d911953", "cde8cc10-7212-4a41-ae9b-bbeb51dbe8ed", 
"d150bfa4-0b52-47a0-b450-1eb21aaada53", "d41db7bc-2b81-4111-9b32-a0aab55cb25a", 
"d7e8e8c7-5190-4f0b-aa49-72e520bc9aad", "dd1218a2-4e67-4cbf-bf4d-9e288865aa63", 
"f093abf9-22e1-47e6-ae5d-1238629d8542", "fae0dd29-2b89-4c1d-b5ad-7858abe122ac", 
"feeb0ab0-7d13-4a5c-b0df-58dd85c7f607", "ff883e61-c9a9-4e6b-8b6b-cab3e5535879"
), class = "factor"), timestamp = c(1457272936.882, 1457337998.931, 
1457424251.996, 1457429767.632, 1457597635.755, 1457683537.604, 
1457861178.161, 1457964712.356, 1458029223.54, 1458046931.652, 
1458051135.219, 1458115293.069, 1458133652.503, 1458202019.302, 
1458203945.674, 1458203945.787, 1458306790.803, 1458308783.441, 
1458460903.755, 1458480932.088), group = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), .Label = c("groupA", "groupB", "groupC", "groupD"), class = "factor"), 
    cameraFeed = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Non-visible camera feed", 
    "Visible camera feed"), class = "factor"), timegroup = structure(c(2L, 
    1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 
    2L, 2L, 1L, 2L), .Label = c("Day", "Evening"), class = "factor"), 
    day = structure(c(4L, 2L, 6L, 6L, 5L, 1L, 4L, 2L, 6L, 6L, 
    6L, 7L, 7L, 5L, 5L, 5L, 1L, 1L, 4L, 4L), .Label = c("Friday", 
    "Monday", "Saturday", "Sunday", "Thursday", "Tuesday", "Wednesday"
    ), class = "factor"), user_reported_percent = c(83.3333333333333, 
    66.6666666666667, 66.6666666666667, 66.6666666666667, 50, 
    66.6666666666667, 50, 50, 66.6666666666667, 50, 83.3333333333333, 
    50, 66.6666666666667, 50, 50, 50, 50, 50, 66.6666666666667, 
    50), machine_percent = c(30.0071, 7.24054, 0.531004, 11.4495, 
    0.633251, 13.6969, 6.30807, 0.375168, 0.635872, 0.651155, 
    0.426338, 0.71563, 15.0509, 0.456111, 12.1912, 12.1912, 11.7268, 
    0.447239, 11.7741, 0.612912)), .Names = c("esmFollValue", 
"esmHappValue", "deviceId", "timestamp", "group", "cameraFeed", 
"timegroup", "day", "user_reported_percent", "machine_percent"
), row.names = c(NA, 20L), class = "data.frame")

我希望每天有一个百分比值。

1 个答案:

答案 0 :(得分:0)

在@nicola的帮助下我使用了这个:

aggregate(d[,c("user_reported_percent","machine_percent")],b‌​y = list(d$day), FUN=median)

一切正常。 非常感谢!