我正在使用RStudio
,我有一个dataframe
(df1
)。
df1
包含几列,但我对这3列感兴趣:compname
,dept
,losYRS
。
我想按compname
得到每个dept
的百分比分布。
我的R
代码如下:
library(lubridate)
library(tidyverse)
df2<-(subset(df1,compName %in% c("A")))
df3<-df2 %>%
group_by(dept) %>%
summarise(count = n() / nrow(.) * 100 )
df3
# A tibble: 11 x 2
dept count
<chr> <dbl>
1 F&B (Kitchen) 18.6
2 F&B (Restaurant) 20.3
3 FINANCE 5.08
4 FRONT OFFICE 10.2
5 HOUSEKEEPING 22.0
6 HR 1.69
7 LEISURE AND SPORT 3.39
8 MAINTENANCE 8.47
9 RESERVATION 1.69
10 SPA 5.08
11 STEWARDING 3.39
不是每次都为每个dept
进行子设置,而是有一种方法来获取以下结果(有或没有Grand Total
):
我还希望将数字格式化为小数点后零位并附加%符号(如上所示)。
样本数据(使用dput):
structure(list(compName2 = c("A", "A", "C",
"B", "C", "A", "A", "B", "B",
"A", "C", "C", "A","B", "B", "A", "C", "C",
"A", "B"), dept = c("MAINTENANCE", "OPERATIONS",
"F&B (Kitchen)", "F&B (Kitchen)", "HOUSEKEEPING", "F&B (Restaurant)",
"RESERVATION", "F&B (Restaurant)", "HOUSEKEEPING", "MAINTENANCE",
"FRONT OFFICE", "HOUSEKEEPING", "MAINTENANCE", "HOUSEKEEPING",
"MAINTENANCE", "F&B (Restaurant)", "HOUSEKEEPING", "F&B (Restaurant)",
"F&B (Restaurant)", "MAINTENANCE"), losYRS = c(31, 30, 29, 28,
28, 28, 28, 27, 27, 27, 27, 27, 27, 26, 26, 26, 26, 26, 26, 25
)), .Names = c("compName", "dept", "losYRS"), row.names = c(NA,
20L), class = "data.frame")
答案 0 :(得分:0)
这是我为您解决的tidyverse
问题:
library(tidyverse)
###Calculate percentages
df2 <- df %>% group_by(compName) %>% group_split(keep=T) %>%
map(group_by,dept) %>% map(function(x){summarize(x,perc = n()/nrow(x)*100,compName=x$compName[1])}) %>% bind_rows %>%
spread(compName,perc)
###Added with base::split() for compatibility reasons:
df2 <- df %>% split(f=df$compName) %>% map(group_by,dept) %>% map(function(x){summarize(x,perc = n()/nrow(x)*100,compName=x$compName[1])}) %>% bind_rows %>%
spread(compName,perc)
##Create summary df for printing
print_df <- df2 %>% mutate_at(c("A","B","C"),round) %>%
bind_rows(df2 %>% replace(is.na(.),0) %>% summarize_at(c("A","B","C"),"sum") %>% mutate_all(round) %>%
bind_cols(enframe("Grand Total",value="dept",name=NULL))) %>% mutate_at(c("A","B","C"),function(x){paste(as.character(x),"%")})
#print as tab-separated .txt-file
write.table(print_df,file="Test.txt",row.names=F)