我希望这个令人费解的头衔有道理,但我遇到的问题并不容易让人头疼。
玩具数据集列出了客户访问以及客户豁免状态和访问类型:
df <- structure(list(Customer = structure(c(8L, 2L, 5L, 4L, 4L, 1L,
1L, 6L, 6L, 7L, 7L, 7L, 3L, 3L, 3L), .Label = c("Aaron", "Elizabeth",
"Frank", "John", "Mary", "Pam", "Rob", "Sam"), class = "factor"),
Exemption = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L,
2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Exempt", "Non-exempt"
), class = "factor"), Type = structure(c(1L, 1L, 2L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), .Label = c("Type 1",
"Type 2"), class = "factor")), .Names = c("Customer", "Exemption",
"Type"), class = "data.frame", row.names = c(NA, -15L))
Customer Exemption Type
1 Sam Non-exempt Type 1
2 Elizabeth Exempt Type 1
3 Mary Exempt Type 2
4 John Non-exempt Type 1
5 John Non-exempt Type 2
6 Aaron Non-exempt Type 2
7 Aaron Non-exempt Type 2
8 Pam Exempt Type 2
9 Pam Exempt Type 2
10 Rob Non-exempt Type 2
11 Rob Non-exempt Type 2
12 Rob Non-exempt Type 1
13 Frank Exempt Type 1
14 Frank Exempt Type 1
15 Frank Exempt Type 2
我想按照他们的访问次数对客户进行分类,然后在其中计算Type1 / 2访问的比例,也可以按照免税状态细分结果,例如输出如下:
Number_of_visits Exemption Type Proportion
1 1 Non-exempt Type 1 1.00
2 1 Non-exempt Type 2 0.00
3 1 Exempt Type 1 0.50
4 1 Exempt Type 2 0.50
5 2 Non-exempt Type 1 0.25
6 2 Non-exempt Type 2 0.75
7 2 Exempt Type 1 0.00
8 2 Exempt Type 2 1.00
9 3 Non-exempt Type 1 0.33
10 3 Non-exempt Type 2 0.67
11 3 Exempt Type 1 0.67
12 3 Exempt Type 2 0.33
我使用group_by(Customer, Type) %>% summarise(n())
使用dplyr
尝试了一些事情,这似乎不正确。
答案 0 :(得分:1)
您可以使用count
中的dplyr
来计算按Exemption
分组的Type
和Number_of_visits
的出现次数:
library(dplyr)
library(tidyr)
res <- df %>% group_by(Customer) %>%
mutate(Number_of_visits=n()) %>%
group_by(Number_of_visits) %>%
count(Exemption, Type) %>%
complete(Type, fill=list(n=0)) %>%
group_by(Number_of_visits,Exemption) %>%
mutate(Proportion=n/sum(n))
注意:
group_by
Customer
使用n()
计算访问次数。group_by
Number_of_visits
并使用count
计算Exemption
和Type
的每个值对的出现次数。这会创建一个名为n
的列,其中包含此计数。tidyr::complete
为Exemption
和Type
填写任意缺失值对,计数为零。group_by
Number_of_visits
和Exemption
来计算所需的Proportion
。使用您的数据的结果符合预期。
print(res)
##Source: local data frame [12 x 5]
##Groups: Number_of_visits, Exemption [6]
##
## Number_of_visits Exemption Type n Proportion
## <int> <fctr> <fctr> <dbl> <dbl>
##1 1 Exempt Type 1 1 0.5000000
##2 1 Exempt Type 2 1 0.5000000
##3 1 Non-exempt Type 1 1 1.0000000
##4 1 Non-exempt Type 2 0 0.0000000
##5 2 Exempt Type 1 0 0.0000000
##6 2 Exempt Type 2 2 1.0000000
##7 2 Non-exempt Type 1 1 0.2500000
##8 2 Non-exempt Type 2 3 0.7500000
##9 3 Exempt Type 1 2 0.6666667
##10 3 Exempt Type 2 1 0.3333333
##11 3 Non-exempt Type 1 1 0.3333333
##12 3 Non-exempt Type 2 2 0.6666667