我将数据存储在数据框中,第一列是日期,第二列是单个权重。这是来自数据的示例:
df <- data.frame(
date = c("2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01",
"2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01",
"2019-01-01", "2019-01-01", "2019-01-02", "2019-01-02", "2019-01-02",
"2019-01-02", "2019-01-02", "2019-01-02", "2019-01-02",
"2019-01-02", "2019-01-02", "2019-01-02"),
weight = c(2174.8, 2174.8, 2174.8, 8896.53, 8896.53, 2133.51, 2133.51,
2892.32, 2892.32, 2892.32, 2892.32, 5287.78, 5287.78, 6674.03,
6674.03, 6674.03, 6674.03, 6674.03, 5535.11, 5535.11)
)
我想先为每个日期运行简单的摘要统计信息,然后查找权重在给定范围内的记录数,并按权重总范围的百分比来定义类别。最后将每个记录的编号存储在单独的列中
Lowest 10%
10-20%
20-40%
40-60%
60-80%
80-90%
90-100%
The logic = (MinWeight + (MaxWeight-MinWeight)*X%)
这是我的预期结果(我只显示两列以显示百分比范围)
df %>%
group_by(date) %>%
summarise(mean(weight), min(weight), max(weight))
date `mean(weight)` `min(weight)` `max(weight)` `Lowest 10%` `10-20%`
2019-01-01 3726. 2134. 8897. num records. num records.
答案 0 :(得分:2)
检查此解决方案:
library(tidyverse)
library(wrapr)
df %>%
group_by(date) %>%
mutate(
rn = row_number(),
temp = weight - min(weight),
temp = (temp / max(temp)) * 100,
temp = cut(temp, seq(0, 100, 10), include.lowest = TRUE),
temp = str_remove(temp, '\\(|\\[') %>%
str_replace(',', '-') %>%
str_replace('\\]', '%'),
one = 1
) %>%
spread(temp, one, fill = 0) %.>%
left_join(
summarise(.,
`mean(weight)` = mean(weight),
`min(weight)` = min(weight),
`max(weight)` = max(weight)
),
summarise_at(., vars(matches('\\d+-\\d+.')), sum)
)
输出:
date `mean(weight)` `min(weight)` `max(weight)` `0-10%` `10-20%` `60-70%` `90-100%`
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2019-01-01 3726. 2134. 8897. 5 3 0 2
2 2019-01-02 5791. 2892. 6674. 1 0 4 5
答案 1 :(得分:2)
可以通过以下方式完成:
library(tidyverse)
df %>%
group_by(date) %>%
mutate(
wrange = cut((weight - min(weight)) / (max(weight - min(weight))) * 100, 10,
labels = paste(
seq(0, 90, by = 10),
paste0(seq(10, 100, by = 10), "%"),
sep = '-')
)
) %>%
left_join(
x = summarise_at(., vars(weight), funs(mean, min, max)),
y = count(., wrange) %>% complete(wrange, fill = list(n = 0)) %>% spread(wrange, n),
by = 'date'
) %>%
rename_at(vars(matches("mean|min|max")), funs(paste(., "(weight)", sep = "")))
哪个输出:
# date mean(weight) min(weight) max(weight) 0-10% 10-20% 20-30% 30-40% 40-50%
# 1 2019-01-01 3726.144 2133.51 8896.53 5 3 0 0 0
# 2 2019-01-02 5790.825 2892.32 6674.03 1 0 0 0 0
# 50-60% 60-70% 70-80% 80-90% 90-100%
# 0 0 0 0 2
# 0 4 0 0 5
(我重新格式化了输出,以显示所有数据)