问题1 - 计算百分比并使用`geom_col`

Question

让我们说我们有10000个用户分为两组：lvl初学者和lvl pro。

每个用户都有一个等级，从1到20。

df：

# beginers
n <- 7000
user.id <- 1:n
lvl <- "beginer"
rank <- sample(1:20, n, replace = TRUE,
               prob = seq(.9,0.1,length.out = 20))
df.beginer <- data.frame(user.id, rank, lvl)

# pros
n <- 3000
user.id <- 1:n
lvl <- "pro"
rank <- sample(1:20, n, replace = TRUE,
               prob = seq(.9,0.3,length.out = 20))
df.pro <- data.frame(user.id, rank, lvl)

library(dplyr)
df <- bind_rows(df.beginer, df.pro)
df2 <- tbl_df(df) %>% group_by(lvl, rank) %>% mutate(count = n())

问题1：我需要一个条形图并排比较每个组，但是如果给我计数，我需要百分比，所以每组的条形将具有相同的最大高度（100％）

到目前为止我得到的情节：

library(ggplot2)
plot <- ggplot(df2, aes(rank))
plot + geom_bar(aes(fill=lvl),  position="dodge")

问题2：

我需要一个比较每组的线图，所以我们将有两条线，但是如果给我数，我需要百分比，所以每组的线将具有相同的最大高度（100％）

到目前为止我得到的情节：

plot + geom_line(aes(y=count, color=lvl))

问题3：

让我们说排名是累积的，所以排名为3的用户也排名第1和第2位。排名为20的用户排名从1到20。

因此，在绘图时，我希望情节从排名1开始，拥有100％的用户，等级2将减少，等级3甚至更低等等。

我在画面上完成了所有这些，但我真的不喜欢它，并希望向自己展示R可以处理所有这些。

谢谢！

Answer 1

三个问题，三个解决方案：

问题1 - 计算百分比并使用`geom_col`

df %>%
  group_by(rank, lvl)%>%
  summarise(count = n()) %>%
  group_by(lvl) %>%
  mutate(count_perc = count / sum(count)) %>% # calculate percentage
  ggplot(., aes(x = rank, y = count_perc))+
  geom_col(aes(fill = lvl), position = 'dodge')

问题2 - 与问题1几乎相同，只是使用`geom_line`而不是`geom_col`

df %>%
  group_by(rank, lvl)%>%
  summarise(count = n()) %>%
  group_by(lvl) %>%
  mutate(count_perc = count / sum(count)) %>%
  ggplot(., aes(x = rank, y = count_perc))+
  geom_line(aes(colour = lvl))

问题3 - 使用`arrange`和`cumsum`

df %>%
  group_by(lvl, rank) %>%
  summarise(count = n()) %>% # count by level and rank
  group_by(lvl) %>%
  arrange(desc(rank)) %>% # sort descending
  mutate(cumulative_count = cumsum(count)) %>% # use cumsum
  mutate(cumulative_count_perc = cumulative_count / max(cumulative_count)) %>%
  ggplot(., aes(x = rank, y = cumulative_count_perc))+
  geom_line(aes(colour = lvl))

ggplot2：通过其成员的一部分比较2组

1 个答案:

问题1 - 计算百分比并使用`geom_col`

问题2 - 与问题1几乎相同，只是使用`geom_line`而不是`geom_col`

问题3 - 使用`arrange`和`cumsum`

ggplot2：通过其成员的一部分比较2组

1 个答案:

问题1 - 计算百分比并使用geom_col

问题2 - 与问题1几乎相同，只是使用geom_line而不是geom_col

问题3 - 使用arrange和cumsum

问题1 - 计算百分比并使用`geom_col`

问题2 - 与问题1几乎相同，只是使用`geom_line`而不是`geom_col`

问题3 - 使用`arrange`和`cumsum`