How do you group your data for a histogram

时间:2018-12-03 13:15:37

标签: r ggplot2 grouping histogram

I have a dataset of people's birth-year. I want to plot a histogram, but since I am working with a fairly large dataset I would like to group my data in classes of 5. For example, there are 30 people born in the year 1985 but in my histogram I want it to show me that the frequency is 6.

This is the code I have so far for my histogram.

ggplot(date, aes(date$year)) + 
  geom_histogram(colour = "black") + 
  labs(title = "...", x = "year", y = "frequency")

3 个答案:

答案 0 :(得分:3)

您可以只更改y轴上的标签以反映所需的变换:

ggplot(date, aes(year)) + 
  geom_histogram(colour = "black") + 
  labs(title = "...", x = "year", y = "frequency") + 
  scale_y_continuous(labels=function(x) x/5)

下面是一些带有伪造数据的示例:

未经转换的原始伪数据的直方图:

enter image description here

完全相同的数据,并添加了scale_y_continuous行:

enter image description here

答案 1 :(得分:2)

带有条形图:

library(dplyr)
library(ggplot2)

dates_df <- data.frame(year = sample(1950:2018, size = 100000,replace = TRUE)) # randomly generated years

classes <- 5  

dates_df %>% group_by(year) %>% summarise(cnt = n()) %>% 
  ggplot(aes(x= year, y = cnt/classes)) + 
  geom_col(colour = "black") + 
  theme_bw()

答案 2 :(得分:1)

您也可以尝试以下方法:

require(data.table)
library(dplyr)
library(ggplot2)

fake_data <- data.table(name = c('John', 'Peter', 'Alan', 'James', 'Jack', 'Elena', 'Maria'),
                        year = c(2018, 2018, 2018, 2017, 2016, 2017, 2018))

fake_data %>%
group_by(year) %>%
summarize(numb_people = length(unique(name)),
        number_people_freq = length(unique(name))/ 5) %>%
as.data.table() %>%
ggplot(., aes(year)) +
        geom_bar(aes(y = number_people_freq), stat = 'identity') +
        labs(title = "...", x = "year", y = "frequency")]