我想弄清楚分组中的值是否与分组中的其他值足够不同。具体来说,我想算出同一位学生的比赛结束时间是否与同一天另一堂课的开始时间相匹配。使用菱形,这是等效的代码:
library(ggplot2)
diamonds %>% group_by(color, cut) %>%
mutate(clash = sum(
lapply(
diamonds %>%
filter(color == color, cut == cut, carat != carat) %$% carat,
function(x) ifelse(x < carat - 0.01 && x > carat + 0.01, 1, 0)))) %>%
arrange(color, cut, clash)
计划是如果冲突大于1,那么我知道另一颗钻石的克拉大小与该组中的钻石非常接近。这给了我以下错误:
Error in sum(sapply(diamonds %>% filter(color == color, cut == cut, carat != :
invalid 'type' (list) of argument
这使对钻石的第二次通话显得晦涩
答案 0 :(得分:2)
您可以使用pmap
代替lapply
,它更适合tidyverse
内:
library(tidyverse)
myfun <- function(.color, .cut, .carat){
diamonds %>%
filter(color == .color, cut == .cut, !between(carat, .carat - 0.01, .carat + 0.01)) %>%
nrow()
}
diamonds %>%
mutate(clash = pmap_int(list(color, cut, carat), myfun)) %>%
arrange(color, cut, clash)
# A tibble: 53,940 x 11
carat cut color clarity depth table price x y z clash
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <int>
1 1.01 Fair D SI2 64.6 56 3003 6.31 6.24 4.05 124
2 1.01 Fair D SI2 64.7 57 3871 6.31 6.27 4.07 124
3 1.01 Fair D SI1 66.3 55 4118 6.22 6.17 4.11 124
4 1.01 Fair D SI2 65.3 55 4205 6.33 6.19 4.09 124
5 1.01 Fair D SI1 65.9 60 4276 6.32 6.18 4.12 124
6 1.01 Fair D SI2 64.6 62 4538 6.26 6.21 4.03 124
7 1.01 Fair D SI1 63.5 58 4751 6.35 6.25 4 124
8 1.01 Fair D SI1 64.6 60 4751 6.12 6.08 3.94 124
9 1.01 Fair D SI1 66.9 54 4751 6.25 6.21 4.17 124
10 1.01 Fair D SI1 66.2 56 5122 6.05 6.1 4.02 124
请注意,此解决方案有效,但效率不高。您可以轻松地修改以下代码以按组操作:
diamonds2 <- diamonds %>%
count(color, carat, cut)
myfun2 <- function(.color, .cut, .carat){
diamonds2 %>%
filter(color == .color, cut == .cut, !between(carat, .carat - 0.01, .carat + 0.01)) %>%
pull(n) %>% sum
}
diamonds2 %>%
mutate(clash = pmap_int(list(color, cut, carat), myfun2)) %>%
left_join(diamonds, ., by = c("color", "carat", "cut")) %>%
arrange(color, cut, clash)
结果相同,但是第二个版本(使用myfun2
)的速度更快。
要查看我们还使用clarity
进行过滤的示例,请参见以下示例:
diamonds3 <- diamonds %>%
count(color, carat, cut, clarity)
myfun3 <- function(.color, .cut, .carat, .clarity){
diamonds3 %>%
filter(color == .color, cut == .cut, clarity == .clarity,
!between(carat, .carat - 0.01, .carat + 0.01)) %>%
pull(n) %>% sum
}
myfun3(.color = "D", .cut == "Fair", .clarity = "I1", .carat = 1.5)
[1] 3