FUN(X [[i]],...)中的错误:仅在具有所有数字变量的数据帧上定义

时间:2018-08-12 16:37:31

标签: r

我总是收到提到的错误,而我唯一想做的就是运行一个简单的sum()函数。这是一个数据帧

data <- read_csv("ucb_admissions.csv")

data %>% 
  filter(Gender == "Female") %>%
  summarise(
    sum(Freq)
  )

通过这种方式,代码可以工作,但是我不明白为什么没有summarise()函数就无法工作

这是我正在使用的数据框,我唯一想知道的是数据中的女性数。如果你们能想到更好的解决方案,请告诉我

image

编辑。

具有图像中数据集结构的数据集可能是

set.seed(5143)    # Make the results reproducible

Admit <- c("Admited", "Rejected")
Gender <- c("Male", "Female")
Dept <- LETTERS[1:4]

data <- expand.grid(Admit, Gender, Dept)
names(data) <- c("Admit", "Gender", "Dept")
data$Freq <- sample.int(600, nrow(data), TRUE)

2 个答案:

答案 0 :(得分:1)

library(dplyr)
data %>% group_by(Gender) %>% summarise(Number=n()) 

 # A tibble: 2 x 2
  Gender Number
  <fct>   <int>
1 Male        8
2 Female      8

使用Base R

nrow(data[data$Gender=='Female',])

答案 1 :(得分:1)

这里有五个解决方案,仅4个基本R和一个dplyr解决方案。

library(dplyr)


with(data, sum(Freq[Gender == "Female"]))
#[1] 2662

sum(data[data$Gender == "Female", "Freq"])
#[1] 2662

with(data, tapply(Freq, Gender, sum))
#  Male Female 
#  2162   2662


aggregate(Freq ~ Gender, data, sum)
#  Gender Freq
#1   Male 2162
#2 Female 2662

data %>% group_by(Gender) %>% summarise(Total = sum(Freq))
## A tibble: 2 x 2
#  Gender Total
#  <fct>  <int>
#1 Male    2162
#2 Female  2662

现在对5种方法进行基准测试。

library(ggplot2)
library(microbenchmark)

mb <- microbenchmark(
    sum1 = with(data, sum(Freq[Gender == "Female"])),
    sum2 = sum(data[data$Gender == "Female", "Freq"]),
    tapply = with(data, tapply(Freq, Gender, sum)),
    agg = aggregate(Freq ~ Gender, data, sum),
    dplyr = data %>% group_by(Gender) %>% summarise(Total = sum(Freq))
)

mb
#Unit: microseconds
#   expr      min        lq       mean    median        uq       max neval
#   sum1   58.946   72.9495   92.31978   86.7075  102.7015   317.988   100
#   sum2  139.752  171.6000  197.02931  187.4195  213.3305   323.226   100
# tapply  178.584  208.8955  237.48214  237.8795  259.6350   366.596   100
#    agg 2824.940 2959.0000 3194.69868 3070.5720 3343.5465  5156.801   100
#  dplyr 3239.238 3361.0070 4377.61585 3506.0325 3753.1655 82005.883   100

基本R解决方案显然更快。

microbenchmark图需要ggplot2

autoplot(mb)

enter image description here