R分组依据|计算按另一列分组的不同值

时间:2018-06-20 17:07:55

标签: r

如何计算每个页面名称的不同visit_id的数量?

visit_id  post_pagename
1       A
1       B
1       C
1       D 
2       A
2       A
3       A
3       B

结果应为:

post_pagename distinct_visit_ids
A     3
B     2
C     1
D     1

尝试过
test_df<-data.frame(cbind(c(1,1,1,1,2,2,3,3),c("A","B","C","D","A","A","A","B")))
colnames(test_df)<-c("visit_id","post_pagename")
test_df

test_df %>%
 group_by(post_pagename) %>%
  summarize(vis_count = n_distinct(visit_id))

但是,这仅给我数据集中不同的visit_id的数量

2 个答案:

答案 0 :(得分:3)

一种方式

test_df %>%
  distinct() %>%
  count(post_pagename)

#   post_pagename     n
#   <fct>         <int>
# 1 A                 3
# 2 B                 2
# 3 C                 1
# 4 D                 1

或另一个

test_df %>%
  group_by(post_pagename) %>%
  summarise(distinct_visit_ids = n_distinct(visit_id))

# A tibble: 4 x 2
#  post_pagename distinct_visit_ids
#  <fct>                      <int>
#1 A                              3
#2 B                              2
#3 C                              1
#4 D                              1

*D has one visit, so it must be counted*

答案 1 :(得分:1)

函数n_distinct()将为您提供数据中离散行的数量,因为您有2行为“ 2 A”,因此您仅应使用n(),该行将计算您的分组变量出现的次数。

test_df<-data.frame(cbind(c(1,1,1,1,2,2,3,3),c("A","B","C","D","A","A","A","B")))
colnames(test_df)<-c("visit_id","post_pagename")
test_df


test_df %>%
unique() %>%
group_by(post_pagename) %>%
summarize(vis_count = n())

这应该很好。

希望它会有所帮助:)