我试图在r中重现以下excel数据透视表:
使用表格:
library(vcd)
library(tables)
tabular(Sex*(Treatment+1)+1~(count=ID + Percent("col")), data=Arthritis)
产地:
count
Sex Treatment ID Percent
Female Placebo 32 38.10
Treated 27 32.14
All 59 70.24
Male Placebo 11 13.10
Treated 14 16.67
All 25 29.76
All 84 100.00
有没有办法让每个性别的治疗百分比加起来像excel数据透视表一样?
答案 0 :(得分:1)
除最终All
行之外的所有内容都可以通过以下方式完成。
library(dplyr)
library(tidyr)
df <- Arthritis %>%
group_by(Sex, Treatment) %>%
summarise(cnt = n()) %>%
ungroup() %>%
spread(Treatment, cnt) %>%
mutate(All = Placebo + Treated) %>%
gather(Treatment, ID , -Sex) %>%
group_by(Sex) %>%
mutate(percent = ID / (sum(ID) / 2)) %>%
arrange(Sex, desc(Treatment)) #forces "Treated" to top of Treatment column for each group
> df
Source: local data frame [6 x 4]
Groups: Sex [2]
Sex Treatment ID percent
<fctr> <chr> <int> <dbl>
1 Female Treated 27 0.4576271
2 Female Placebo 32 0.5423729
3 Female All 59 1.0000000
4 Male Treated 14 0.5600000
5 Male Placebo 11 0.4400000
6 Male All 25 1.0000000
如果你想要一个总行,你可以使用以下,但它不是很漂亮。
grand_total <- data.frame(Sex = "Total" , "Treatment" = "All",
ID = nrow(Arthritis), percent = 1,
stringsAsFactors = FALSE)
df_final <- bind_rows(df, grand_total)
现在,如果您要删除Sex
列以外的第一个事件,那么您可以这样做。由于我们在Treatment
列上按降序排序,因此我们知道Treated
它是每个组的顶部。因此,当Sex
列不等于Treatment
时,我们只需将Treated
列替换为空白。我们也不会删除我们创建的All
。
df_final$Sex[df_final$Treatment != "Treated" &
df_final$Sex %in% c("Female", "Male")] <- ""
Source: local data frame [7 x 4]
Groups: Sex [3]
Sex Treatment ID percent
<chr> <chr> <int> <dbl>
1 Female Treated 27 0.4576271
2 Placebo 32 0.5423729
3 All 59 1.0000000
4 Male Treated 14 0.5600000
5 Placebo 11 0.4400000
6 All 25 1.0000000
7 Total All 84 1.0000000