我正在尝试将R中的数据集聚合为中位数。
d <- aggregate(c(d$user_reported_percent, d$machine_percent),
by = list(d$day), FUN=median, simplify = TRUE, drop = TRUE)
但是R一直在抱怨,我不确定与中位数汇总是否合理。
R给我的一些错误: aggregate.data.frame(as.data.frame(x),...)出错: 参数必须具有相同的长度
然后我尝试使用mutate来至少找到中位数
d <- d %>% group_by(day) %>% mutate(median=median(user_reported_percent))
错误是: 错误:无效的下标类型&#39;整数&#39;
我将不胜感激任何帮助! 非常感谢!
P.S的意思是一切都很完美
我的数据集如下所示:
structure(list(esmFollValue = c(36.00852, 8.688648, 0.6372048,
13.7394, 0.7599012, 16.43628, 7.569684, 0.4502016, 0.7630464,
0.781386, 0.5116056, 0.858756, 18.06108, 0.5473332, 14.62944,
14.62944, 14.07216, 0.5366868, 14.12892, 0.7354944), esmHappValue = c(100L,
80L, 80L, 80L, 60L, 80L, 60L, 60L, 80L, 60L, 100L, 60L, 80L,
60L, 60L, 60L, 60L, 60L, 80L, 60L), deviceId = structure(c(11L,
11L, 11L, 6L, 3L, 15L, 3L, 3L, 15L, 3L, 15L, 15L, 15L, 15L, 3L,
3L, 15L, 3L, 9L, 9L), .Label = c("1e6c1183-af64-4860-b2d6-533cab7afe6c",
"34209e3d-1a82-4f75-95c8-846be8a1be03", "7066f4af-82f3-4369-8f45-70d1ea3d22f2",
"7cf78328-60c5-4564-9dd0-309cb0b3d5ad", "95b11f22-91e8-46d0-88d9-4f197267aa29",
"a0c89d2a-d22d-41d0-a070-b9887d911953", "cde8cc10-7212-4a41-ae9b-bbeb51dbe8ed",
"d150bfa4-0b52-47a0-b450-1eb21aaada53", "d41db7bc-2b81-4111-9b32-a0aab55cb25a",
"d7e8e8c7-5190-4f0b-aa49-72e520bc9aad", "dd1218a2-4e67-4cbf-bf4d-9e288865aa63",
"f093abf9-22e1-47e6-ae5d-1238629d8542", "fae0dd29-2b89-4c1d-b5ad-7858abe122ac",
"feeb0ab0-7d13-4a5c-b0df-58dd85c7f607", "ff883e61-c9a9-4e6b-8b6b-cab3e5535879"
), class = "factor"), timestamp = c(1457272936.882, 1457337998.931,
1457424251.996, 1457429767.632, 1457597635.755, 1457683537.604,
1457861178.161, 1457964712.356, 1458029223.54, 1458046931.652,
1458051135.219, 1458115293.069, 1458133652.503, 1458202019.302,
1458203945.674, 1458203945.787, 1458306790.803, 1458308783.441,
1458460903.755, 1458480932.088), group = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = c("groupA", "groupB", "groupC", "groupD"), class = "factor"),
cameraFeed = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Non-visible camera feed",
"Visible camera feed"), class = "factor"), timegroup = structure(c(2L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L,
2L, 2L, 1L, 2L), .Label = c("Day", "Evening"), class = "factor"),
day = structure(c(4L, 2L, 6L, 6L, 5L, 1L, 4L, 2L, 6L, 6L,
6L, 7L, 7L, 5L, 5L, 5L, 1L, 1L, 4L, 4L), .Label = c("Friday",
"Monday", "Saturday", "Sunday", "Thursday", "Tuesday", "Wednesday"
), class = "factor"), user_reported_percent = c(83.3333333333333,
66.6666666666667, 66.6666666666667, 66.6666666666667, 50,
66.6666666666667, 50, 50, 66.6666666666667, 50, 83.3333333333333,
50, 66.6666666666667, 50, 50, 50, 50, 50, 66.6666666666667,
50), machine_percent = c(30.0071, 7.24054, 0.531004, 11.4495,
0.633251, 13.6969, 6.30807, 0.375168, 0.635872, 0.651155,
0.426338, 0.71563, 15.0509, 0.456111, 12.1912, 12.1912, 11.7268,
0.447239, 11.7741, 0.612912)), .Names = c("esmFollValue",
"esmHappValue", "deviceId", "timestamp", "group", "cameraFeed",
"timegroup", "day", "user_reported_percent", "machine_percent"
), row.names = c(NA, 20L), class = "data.frame")
我希望每天有一个百分比值。
答案 0 :(得分:0)
在@nicola的帮助下我使用了这个:
aggregate(d[,c("user_reported_percent","machine_percent")],by = list(d$day), FUN=median)
一切正常。 非常感谢!