Question

我目前正在使用＆＃34;钻石＆＃34;来自ggplot2包的数据集。我希望能够计算在前n个观察中满足两个条件的钻石数量，在这种情况下颜色为E＆＃39; E＆＃39;和清晰度＆＃39; SI2＆＃39;。我已经编写了下面的函数来解决这个问题，但是我希望能够在不需要运行for循环的情况下执行此操作。有没有办法让这个功能在没有for循环的情况下工作？数据集有54000个obs。

library('ggplot2')
data(diamonds)

countfreq <- function(n) {
  #Set k to 0
  k <- 0
  for(i in 1:n) {
    if (diamonds$color[i] == 'E' & diamonds$clarity[i] == 'SI2') 
      k <- k + 1
  }
  return(k)
}

countfreq(50)
2
countfreq(100) 
3

数据框的前两行如下所示。

 carat  cut  color clarity  depth  table  price   x    y    z 
1 0.23 Ideal   E     SI2     61.5   55.0   326  3.95 3.98  2.43
2 0.21 Premium E     SI1     59.8   61.0   326  3.89 3.84  2.31

Answer 1

我会给你一些能回答你问题的东西，并帮助你理解使用dplyr包裹来回答这类问题的更一般方法

library(ggplot2)
library(dplyr)

diamonds %>% # take the diamonds data.fram and group it
    group_by(color, clarity) %>% # 56 groups
    summarize(count = n()) %>% # add a count column
    filter(color=="E", clarity=="SI2") %>%  # filter the row you want
    .$count # just the single value as a result

[1] 1713

请注意，可以运行代码的任何部分来查看中间结果。例如，要查看组表和每个组的计数，请运行以下部分：

diamonds %>% # take the diamonds data.fram and group it
        group_by(color, clarity) %>% # 56 groups
        summarize(count = n())

# A tibble: 56 x 3
# Groups:   color [?]
   color clarity count
   <ord>   <ord> <int>
 1     D      I1    42
 2     D     SI2  1370
 3     D     SI1  2083
 4     D     VS2  1697
 5     D     VS1   705
 6     D    VVS2   553
 7     D    VVS1   252
 8     D      IF    73
 9     E      I1   102
10     E     SI2  1713
# ... with 46 more rows

使用两个条件计算数据框中的出现次数而不使用for循环

1 个答案: