在R

时间:2018-11-29 16:11:47

标签: r pivot-table

我确定这确实很容易,但是作为R新手,我正在拔头发。

我有一个数据框:

df <- data.frame("Factor_1" = c(1,2,1,1,2,1,1,2,1,2,1,2),
                 "Factor_2" = c("M", "F", "M", "F","M", "F","M", "F","M", "F","M", "F"),
                 "Denominator" = c(1,1,1,1,1,1,1,1,1,1,1,1),
                 "Numerator" = c(0,0,1,0,0,0,1,0,0,0,1,1))

我想创建一些图形:

(1) Sum(Denominator) - split by Factor_1
(2) Sum(Numerator)/Sum(Denominator) - split by Factor_1

(so Factor_1 appears on the horizontal axis)

(and then repeat for Factor_2)

理想情况下(1)和(2)具有不同的垂直轴,并且(1)为列,(2)为线。

看起来有点像所附图片(来自Excel数据透视表/图形):

Pivot Graph

2 个答案:

答案 0 :(得分:0)

library(tidyverse)

df <- data.frame("Factor_1" = c(1,2,1,1,2,1,1,2,1,2,1,2),
             "Factor_2" = c("M", "F", "M", "F","M", "F","M", "F","M", "F","M", "F"),
             "Denominator" = c(1,1,1,1,1,1,1,1,1,1,1,1),
             "Numerator" = c(0,0,1,0,0,0,1,0,0,0,1,1))




df %>% group_by(Factor_1) %>% summarize(sum_num=sum(Numerator),sum_dem=sum(Denominator)) %>% mutate(ratio=sum_num/sum_dem)

A tibble: 2 x 4
Factor_1 sum_num sum_dem ratio
   <dbl>   <dbl>   <dbl> <dbl>
       1       3       7 0.429
       2       1       5 0.2 

有帮助吗?

答案 1 :(得分:0)

与其像Excel中的枢纽一样思考这个问题,不如将其视为使用tidyverse的绝佳机会!

让我们设置环境:

library(tidyverse)   # This will load dplyr and tidyverse to make visualization easier!

df <- data.frame("Factor_1" = c(1,2,1,1,2,1,1,2,1,2,1,2),   
             "Factor_2" = c("M", "F", "M", "F","M", "F","M", "F","M", "F","M", "F"),   
             "Denominator" = c(1,1,1,1,1,1,1,1,1,1,1,1),   
             "Numerator" = c(0,0,1,0,0,0,1,0,0,0,1,1))   

首先让我们使用Factor_1。首先,我们希望每个Factor_1组的分子和分母和以及分子/分母比。我们需要告诉R我们要分组 Factor_1。然后,我们可以使用summarize()包中的dplyr函数来完成大部分繁重的工作。

summaryFactor1 <- df %>%                     # Save as new object: summaryFactor1
group_by(Factor_1) %>%                       # Group data by Factor_1, and for each:
summarize(sum_num = sum(Numerator),          # sum Numerator
        sum_den = sum(Denominator)) %>%      # sum Denominator
mutate(ratio = sum_num/sum_den)              # and create new column for num/den

这将给我们:

summaryFactor1
#  A tibble: 2 x 4
  Factor_1 sum_num sum_den ratio
     <dbl>   <dbl>   <dbl> <dbl>
1        1       3       7 0.429
2        2       1       5 0.2  

为重现您要查找的图形,我们以summaryFactor1小标题并使用ggplot:

summaryFactor1 %>%                        # Use our summary table
ggplot(aes(x = Factor_1)) +               # plot Factor_1 on x-axis, 
geom_col(aes(y = sum_den)) +              # sum_den as columns, 
geom_line(aes(y = ratio))                 # and ratio as a line

Factor_1 summary plot

请注意,只有一个y轴,因此绘制比例的线很难解释。虽然您从Excel中共享的所需图看起来更好,但要警惕对该比率的误解。

我们可以对Factor_2使用与上述相同的逻辑:

summaryFactor2 <- df %>%                     # Save as new object: summaryFactor1                    
group_by(Factor_2) %>%                       # Group data by Factor2, and for each:    
summarize(sum_num = sum(Numerator),          # sum Numerator
        sum_den = sum(Denominator)) %>%      # sum Denominator
mutate(ratio = sum_num/sum_den)              # and create new column for num/den

# Let's view the result
summaryFactor2
# A tibble: 2 x 4
 Factor_2 sum_num sum_den ratio
  <fct>      <dbl>   <dbl> <dbl>
1 F              1       6 0.167
2 M              3       6 0.5  

在继续之前,请注意每个组的分母的总和是相同的。当我们在Factor_1的组中比较比率时,请注意两组的分母总和不同,因此这是一个更容易的1:1比较。

由于在两组之间绘制sum_den并不是很有见识...

summaryFactor2
ggplot(aes(x = Factor_2)) + 
geom_col(aes(y = sum_den)) 

Factor2 summary plot

让我们绘制比例:

summaryFactor2 %>% 
ggplot(aes(x = Factor_2)) + 
geom_col(aes(y = ratio)) 

Factor_2 ratio plot