鉴于data_frame df <- data_frame(X = c('A', 'A', 'B', 'B', 'B'), Y = c('M', 'N', 'M', 'M', 'N'))
,我需要提出一个数据框,告诉我们50%的A
是M
,A
的50% {}}是N
,B
中有67%是M
,而B
中有33%是N
{1}}。
我有一些常规用来做它,但它看起来很可怕。
library(tidyverse)
df <- data_frame(X = c('A', 'A', 'B', 'B', 'B'), Y = c('M', 'N', 'M', 'M', 'N'))
# here we go...
df %>%
group_by(X) %>%
mutate(n_X = n()) %>%
group_by(X, Y) %>%
summarise(PERCENT = n() / first(n_X))
输出,
Source: local data frame [4 x 3]
Groups: X [?]
X Y PERCENT
<chr> <chr> <dbl>
1 A M 0.5000000
2 A N 0.5000000
3 B M 0.6666667
4 B N 0.3333333
有没有更好的方法来做到这一点?当然,我错过了一些东西。
答案 0 :(得分:6)
您可以使用prop.table
:
df %>%
group_by(X, Y) %>%
count() %>%
mutate(PERCENT = prop.table(n))
结果:
X Y n PERCENT
<chr> <chr> <int> <dbl>
1 A M 1 0.5000000
2 A N 1 0.5000000
3 B M 2 0.6666667
4 B N 1 0.3333333
答案 1 :(得分:4)
我们可以使用table
和rowSums
new_df <- table(df$X, df$Y)
new_df/rowSums(new_df)
# M N
# A 0.5000000 0.5000000
# B 0.6666667 0.3333333
答案 2 :(得分:2)
可能是这样的:
base R
tbl <- xtabs(~X+Y, df)
as.data.frame(tbl/rowSums(tbl), responseName = "prop")
data.table
library(data.table)
DT <- data.table(df)[, .N, by = .(X,Y)]
setDT(DT)[, prop := N/sum(N), by = 'X']
DT
# X Y N prop
#1: A M 1 0.5000000
#2: A N 1 0.5000000
#3: B M 2 0.6666667
#4: B N 1 0.3333333
#normal=$(tput sgr0) # normal text
normal=$'\e[0m' # (works better sometimes)
bold=$(tput bold) # make colors bold/bright
red="$bold$(tput setaf 1)" # bright red text
green=$(tput setaf 2) # dim green text
fawn=$(tput setaf 3); beige="$fawn" # dark yellow text
yellow="$bold$fawn" # bright yellow text
darkblue=$(tput setaf 4) # dim blue text
blue="$bold$darkblue" # bright blue text
purple=$(tput setaf 5); magenta="$purple" # magenta text
pink="$bold$purple" # bright magenta text
darkcyan=$(tput setaf 6) # dim cyan text
cyan="$bold$darkcyan" # bright cyan text
gray=$(tput setaf 7) # dim white text
darkgray="$bold"$(tput setaf 0) # bold black = dark gray text
white="$bold$gray" # bright white text
echo "${red}hello ${yellow}this is ${green}coloured${normal}"