Question

鉴于data_frame df <- data_frame(X = c('A', 'A', 'B', 'B', 'B'), Y = c('M', 'N', 'M', 'M', 'N'))，我需要提出一个数据框，告诉我们50％的A是M，A的50％ {}}是N，B中有67％是M，而B中有33％是N {1}}。

我有一些常规用来做它，但它看起来很可怕。

library(tidyverse)
df <- data_frame(X = c('A', 'A', 'B', 'B', 'B'), Y = c('M', 'N', 'M', 'M', 'N')) 
# here we go...
df %>% 
  group_by(X) %>% 
  mutate(n_X = n()) %>% 
  group_by(X, Y) %>% 
  summarise(PERCENT = n() / first(n_X))

输出，

Source: local data frame [4 x 3]
Groups: X [?]

      X     Y   PERCENT
  <chr> <chr>     <dbl>
1     A     M 0.5000000
2     A     N 0.5000000
3     B     M 0.6666667
4     B     N 0.3333333

有没有更好的方法来做到这一点？当然，我错过了一些东西。

Answer 1

您可以使用prop.table：

df %>% 
  group_by(X, Y) %>%
  count() %>%
  mutate(PERCENT = prop.table(n))

结果：

      X     Y     n   PERCENT
  <chr> <chr> <int>     <dbl>
1     A     M     1 0.5000000
2     A     N     1 0.5000000
3     B     M     2 0.6666667
4     B     N     1 0.3333333

Answer 2

我们可以使用table和rowSums

在基地R中尝试

new_df <- table(df$X, df$Y)
new_df/rowSums(new_df)

#          M         N
#  A 0.5000000 0.5000000
#  B 0.6666667 0.3333333

Answer 3

可能是这样的：

base R

tbl <- xtabs(~X+Y, df)
as.data.frame(tbl/rowSums(tbl), responseName = "prop")

data.table

library(data.table)
DT <- data.table(df)[, .N, by = .(X,Y)]
setDT(DT)[, prop := N/sum(N), by = 'X']
DT

#   X Y N      prop
#1: A M 1 0.5000000
#2: A N 1 0.5000000
#3: B M 2 0.6666667
#4: B N 1 0.3333333

#normal=$(tput sgr0) # normal text normal=$'\e[0m' # (works better sometimes) bold=$(tput bold) # make colors bold/bright red="$bold$(tput setaf 1)" # bright red text green=$(tput setaf 2) # dim green text fawn=$(tput setaf 3); beige="$fawn" # dark yellow text yellow="$bold$fawn" # bright yellow text darkblue=$(tput setaf 4) # dim blue text blue="$bold$darkblue" # bright blue text purple=$(tput setaf 5); magenta="$purple" # magenta text pink="$bold$purple" # bright magenta text darkcyan=$(tput setaf 6) # dim cyan text cyan="$bold$darkcyan" # bright cyan text gray=$(tput setaf 7) # dim white text darkgray="$bold"$(tput setaf 0) # bold black = dark gray text white="$bold$gray" # bright white text

echo "${red}hello ${yellow}this is ${green}coloured${normal}"

在dplyr中计算组内比例的更优雅的方法是什么？

3 个答案: