Barplot dplyr总结了价值观

时间:2018-05-15 15:58:41

标签: r plot dplyr

我有前三名的数据。我正在尝试创建一个在x轴上具有列名称(成本/产品)的图,y值是频率(理想情况下是相对频率,但我不知道如何在dplyr中获得该值)。

我试图通过dplyr中汇总的值来绘制。我有一个dplyr数据框,看起来像这样:

likelyReasonFreq<-    LikelyRenew_Reason %>%
      filter(year==3)%>%
      filter(status==1)%>%
      summarize(costC = count(cost), 
                productsC = count(products))



   > likelyReasonFreq
          costC.x   costC.freq   productsC.x  productsC.freq
     1       1         10           1             31
     2       2         11           2             40
     3       3         17           3             30
     4      NA        149          NA             86

我正在尝试创建一个条形图,显示成本和产品的总(总和)频率。因此,成本频率将是排名为1,2或3的次数的频率.38基本上我将行1:3相加(对于产品,它将是101(不包括NA值)。

我不知道如何解决这个问题,任何想法?

下面是变量possibleReasonFreq

> dput(head(likelyReasonFreq))
 structure(list(costC = structure(list(x = c(1, 2, 3, NA), freq = c(10L, 
  11L, 17L, 149L)), .Names = c("x", "freq"), row.names = c(NA, 
  4L), class = "data.frame"), productsC = structure(list(x = c(1, 
  2, 3, NA), freq = c(31L, 40L, 30L, 86L)), .Names = c("x", "freq"
  ), row.names = c(NA, 4L), class = "data.frame")), .Names = c("costC", 
  "productsC"), row.names = c(NA, 4L), class = "data.frame")

我感谢任何建议!

1 个答案:

答案 0 :(得分:2)

您的数据结构使用起来有点尴尬,您可以strglimpse查看问题,但是您可以按照以下方式修复此问题,然后可以绘制它。

> str(df)
'data.frame':   4 obs. of  2 variables:
 $ costC    :'data.frame':  4 obs. of  2 variables:
  ..$ x   : num  1 2 3 NA
  ..$ freq: int  10 11 17 149
 $ productsC:'data.frame':  4 obs. of  2 variables:
  ..$ x   : num  1 2 3 NA
  ..$ freq: int  31 40 30 86

绘图时要遵循的代码:

library(ggplot2)
library(tidyverse)
df <- df %>% map(unnest) %>% bind_rows(.id="Name") %>% na.omit() #fixing the structure of column taken as a set of two separate columns

df %>% 
    ggplot(aes(x=Name, y= freq)) +
    geom_col()

我希望这是预期的,尽管我并不完全确定。

输入数据

df <- structure(list(costC = structure(list(x = c(1, 2, 3, NA), freq = c(10L, 
  11L, 17L, 149L)), .Names = c("x", "freq"), row.names = c(NA, 
  4L), class = "data.frame"), productsC = structure(list(x = c(1, 
  2, 3, NA), freq = c(31L, 40L, 30L, 86L)), .Names = c("x", "freq"
  ), row.names = c(NA, 4L), class = "data.frame")), .Names = c("costC", 
  "productsC"), row.names = c(NA, 4L), class = "data.frame")

<强>输出

enter image description here

在OP请求后添加:

在这里,我没有删除NAs,而是用新值替换了#4;&#39;。为了获得各组之间的相对和,我使用了cumsum,然后除以两组中的总和来得到相对频率。

df <- df %>% map(unnest) %>% bind_rows(.id="Name") 

df[is.na(df$x),"x"] <- 4

df %>% 
    group_by(Name) %>% 
    mutate(sum_Freq = sum(freq), cum_Freq = cumsum(freq)) %>% 
    filter(x == 3) %>% 
    mutate(new_x = cum_Freq*100/sum_Freq) %>% 
    ggplot(aes(x=Name, y = new_x)) +
    geom_col()