在数据框中创建包含百分比的新列

时间:2016-10-22 19:21:18

标签: r ggplot2

我有以下数据框:

dput(df1)

structure(list(month = c(1, 1, 2, 2, 3, 4), transaction_type = c("AAA", 
"BBB", "BBB", "CCC", 
"DDD", "AAA"), max_wt_per_month = c(54.9, 
51.6833333333333, 52.3333333333333, 49.4666666666667, 49.85, 
48.5833333333333), min_wt_per_month = c(0, 0, 0, 0, 0, 0), avg_wt_per_month = c(8.41701333107861, 
7.65211141060198, 6.44184012508551, 7.74798927613941, 7.4360566888844, 
7.50611319574734), prop = c(Inf, Inf, Inf, Inf, Inf, Inf)), .Names = c("month", 
"transaction_type", "max_wt_per_month", "min_wt_per_month", "avg_wt_per_month", 
"prop"), row.names = c(NA, -6L), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"), vars = list(month), drop = TRUE, indices = list(
    0:5), group_sizes = 6L, biggest_group_size = 6L, labels = structure(list(
    month = 1), row.names = c(NA, -1L), class = "data.frame", vars = list(
    month), drop = TRUE, .Names = "month"))

我想创建列prop,其中包含相对于每个月的最长等待时间百分比。如果我运行此代码,那么我在大多数行中都会获得Inf个值...(特别是在真实数据集中很明显):

my_fun=function(vec){ 
  100*as.numeric(vec[3]) / 
    sum(with(data_merged_transactions, ifelse(month == vec[1], max_wt_per_month, 0))) }
data_merged_transactions$prop=apply(data_merged_transactions , 1 , my_fun)

然后我最终需要创建填充区域图表,以便每个区域的百分比都是100%:

ggplot(data_merged_transactions, aes(x=month, y=prop, fill=transaction_type)) + 
  geom_area(alpha=0.6 , size=1, colour="black")

如果总和不等于0,为什么我会得到Inf? 此外,是否有可能创建填充区域图表,其中月份是因子(1月,2月等),而不是数字?我试图用月份名称替换月份ID,但后来我得到的是非常细的条形而不是填充区域。

1 个答案:

答案 0 :(得分:1)

这是你在找什么?

library(tidyverse)
df1_tidy <- df1 %>% 
                group_by(month) %>% 
                summarise(SUM = sum(max_wt_per_month)) %>%
                full_join(df1) %>% 
                mutate(prop = max_wt_per_month / SUM)


ggplot(data = df1_tidy, 
       aes(x = month, 
           y = prop, 
           fill = transaction_type)) + 
  geom_area(alpha = 0.6, 
            size = 1, 
            colour = "black") +
  scale_x_continuous(labels = c("Jan", "Feb", "Mar", "Apr"))