我总是收到提到的错误,而我唯一想做的就是运行一个简单的sum()
函数。这是一个数据帧
data <- read_csv("ucb_admissions.csv")
data %>%
filter(Gender == "Female") %>%
summarise(
sum(Freq)
)
通过这种方式,代码可以工作,但是我不明白为什么没有summarise()
函数就无法工作
这是我正在使用的数据框,我唯一想知道的是数据中的女性数。如果你们能想到更好的解决方案,请告诉我
编辑。
具有图像中数据集结构的数据集可能是
set.seed(5143) # Make the results reproducible
Admit <- c("Admited", "Rejected")
Gender <- c("Male", "Female")
Dept <- LETTERS[1:4]
data <- expand.grid(Admit, Gender, Dept)
names(data) <- c("Admit", "Gender", "Dept")
data$Freq <- sample.int(600, nrow(data), TRUE)
答案 0 :(得分:1)
library(dplyr)
data %>% group_by(Gender) %>% summarise(Number=n())
# A tibble: 2 x 2
Gender Number
<fct> <int>
1 Male 8
2 Female 8
使用Base R
nrow(data[data$Gender=='Female',])
答案 1 :(得分:1)
这里有五个解决方案,仅4个基本R和一个dplyr
解决方案。
library(dplyr)
with(data, sum(Freq[Gender == "Female"]))
#[1] 2662
sum(data[data$Gender == "Female", "Freq"])
#[1] 2662
with(data, tapply(Freq, Gender, sum))
# Male Female
# 2162 2662
aggregate(Freq ~ Gender, data, sum)
# Gender Freq
#1 Male 2162
#2 Female 2662
data %>% group_by(Gender) %>% summarise(Total = sum(Freq))
## A tibble: 2 x 2
# Gender Total
# <fct> <int>
#1 Male 2162
#2 Female 2662
现在对5种方法进行基准测试。
library(ggplot2)
library(microbenchmark)
mb <- microbenchmark(
sum1 = with(data, sum(Freq[Gender == "Female"])),
sum2 = sum(data[data$Gender == "Female", "Freq"]),
tapply = with(data, tapply(Freq, Gender, sum)),
agg = aggregate(Freq ~ Gender, data, sum),
dplyr = data %>% group_by(Gender) %>% summarise(Total = sum(Freq))
)
mb
#Unit: microseconds
# expr min lq mean median uq max neval
# sum1 58.946 72.9495 92.31978 86.7075 102.7015 317.988 100
# sum2 139.752 171.6000 197.02931 187.4195 213.3305 323.226 100
# tapply 178.584 208.8955 237.48214 237.8795 259.6350 366.596 100
# agg 2824.940 2959.0000 3194.69868 3070.5720 3343.5465 5156.801 100
# dplyr 3239.238 3361.0070 4377.61585 3506.0325 3753.1655 82005.883 100
基本R解决方案显然更快。
microbenchmark
图需要ggplot2
。
autoplot(mb)