我有Rank,Status&计为通过聚合父数据帧创建的数据帧。我想找到比例/百分比如下。
,即完整&之间的不完整百分比/比率是多少?每个等级的总分不完整。
Rank Status `n()`
<fct> <fct> <int> <ratio>
1 A Incomplete 602
2 A Complete 9443 602/9443
3 B Incomplete 1425
4 B Complete 10250 ----
5 C Incomplete 1347 ----
6 C Complete 6487
7 D Incomplete 1118
8 D Complete 3967
9 E Incomplete 715
10 E Complete 1948
我尝试了sapply()
迭代&amp;计算比率&amp;将其存储在另一个df
中。但有没有更好的方法呢?
否则,如果堆积条形图可以标记上面的百分比/比率,那就太棒了。
我试过的堆积条显示总数的百分比而不是比率。
感谢。
答案 0 :(得分:2)
使用dplyr
:
library(dplyr)
df <- data.frame(Rank = c("A", "A", "B", "B", "C", "C", "D", "D", "E", "E"),
Status = c("Incomplete", "Complete","Incomplete", "Complete",
"Incomplete", "Complete","Incomplete", "Complete",
"Incomplete", "Complete"),
Count = c(602, 9443, 1425, 10250, 1347, 6487, 1118, 3967, 715, 1948))
# Ratio
df %>% group_by(Rank) %>% mutate(Ratio = Count/sum(Count))
# A tibble: 10 x 4
# Groups: Rank [5]
# Rank Status Count Ratio
# <fct> <fct> <dbl> <dbl>
# 1 A Incomplete 602. 0.0599
# 2 A Complete 9443. 0.940
# 3 B Incomplete 1425. 0.122
# 4 B Complete 10250. 0.878
# 5 C Incomplete 1347. 0.172
# 6 C Complete 6487. 0.828
# 7 D Incomplete 1118. 0.220
# 8 D Complete 3967. 0.780
# 9 E Incomplete 715. 0.268
#10 E Complete 1948. 0.732
# Percentage
df %>% group_by(Rank) %>% mutate(Percentage = (Count/sum(Count))*100)
# A tibble: 10 x 4
# Groups: Rank [5]
# Rank Status Count Percentage
# <fct> <fct> <dbl> <dbl>
# 1 A Incomplete 602. 5.99
# 2 A Complete 9443. 94.0
# 3 B Incomplete 1425. 12.2
# 4 B Complete 10250. 87.8
# 5 C Incomplete 1347. 17.2
# 6 C Complete 6487. 82.8
# 7 D Incomplete 1118. 22.0
# 8 D Complete 3967. 78.0
# 9 E Incomplete 715. 26.8
#10 E Complete 1948. 73.2
答案 1 :(得分:1)
在dcast
data.table
<强>代码:强>
library('data.table')
dcast(setDT(df), formula = Rank~Status, value.var = "count")[, ratio := Incomplete / Complete][]
如果您在给定排名中有重复状态,例如排名A有两个不完整状态,计数为602和605,那么这将处理它。
dcast(setDT(df2)[, .(count = sum(count)), by = .(Rank, Status)], # sum count by Status and Rank
formula = Rank~Status, value.var = "count")[, ratio := Incomplete / Complete][]
<强>输出:强>
没有重复状态
# Rank Complete Incomplete ratio
# 1: A 9443 602 0.06375093
# 2: B 10250 1425 0.13902439
# 3: C 6487 1347 0.20764606
# 4: D 3967 1118 0.28182506
# 5: E 1948 715 0.36704312
重复状态
# Rank Complete Incomplete ratio
# 1: A 9443 1207 0.1278195
# 2: B 10250 1425 0.1390244
# 3: C 6487 1347 0.2076461
# 4: D 3967 1118 0.2818251
# 5: E 1948 715 0.3670431
数据:强>
没有重复状态
df <- read.table(text='Rank Status `n()`
1 A Incomplete 602
2 A Complete 9443
3 B Incomplete 1425
4 B Complete 10250
5 C Incomplete 1347
6 C Complete 6487
7 D Incomplete 1118
8 D Complete 3967
9 E Incomplete 715
10 E Complete 1948')
colnames(df)[3] <- 'count'
有重复状态:
df2 <- read.table(text='Rank Status `n()`
1 A Incomplete 602
2 A Incomplete 605
2.1 A Complete 9443
3 B Incomplete 1425
4 B Complete 10250
5 C Incomplete 1347
6 C Complete 6487
7 D Incomplete 1118
8 D Complete 3967
9 E Incomplete 715
10 E Complete 1948')
colnames(df2)[3] <- 'count'
答案 2 :(得分:0)
我没有使用dplyr包,但我认为以下逻辑可行: 假设你的数据帧是df
# creating sample script as yours
p <- c("Incomplete","Complete","Incomplete","Complete","Incomplete","Complete")
q <- c(604,9443,1425,10250,1347,6487)
# ignoring the ranks
df <- data.frame("Status" = p,"counts" = q)
ratiovector <- sample(c(0),size = NROW(df), replace = T)
kcomp <- which(df$Status == "Complete")
kincomp <- which(df$Status == "Incomplete")
ratiovector[kcomp] <- df$counts[kincomp]/df$counts[kcomp]
dfnew <- cbind(df,"ratio" = ratiovector)
# print dfnew
dfnew
# if you want it in string form convert it.
答案 3 :(得分:0)
在基地R:
df$ratio <- ave(df$Count,df$Rank,FUN=function(x)x/sum(x))
# Rank Status Count ratio
# 1 A Incomplete 602 0.05993031
# 2 A Complete 9443 0.94006969
# 3 B Incomplete 1425 0.12205567
# 4 B Complete 10250 0.87794433
# 5 C Incomplete 1347 0.17194281
# 6 C Complete 6487 0.82805719
# 7 D Incomplete 1118 0.21986234
# 8 D Complete 3967 0.78013766
# 9 E Incomplete 715 0.26849418
# 10 E Complete 1948 0.73150582