我想使用small
为每个不同的video.id汇总我的数据dplyr
。
small %>%
group_by(Video.ID) %>%
summarise(sumr = sum(Partner.Revenue),
len = mean(Video.Duration..sec.),
cat = mean(Category))
意味着(类别)显然是错误的做法。我如何才能使用重复多次的值(一个video.id始终是相同的类别,无论它在数据帧中出现的频率如何)。
我的数据框如下所示:
small
# A tibble: 6 x 7
X1 X1_1 Video.ID Video.Duration..sec. Category Owned.Views Partner.Revenue
<int> <int> <chr> <int> <chr> <int> <dbl>
1 1 1 ---0zh9uzSE 1184 gadgets 6 0
2 2 2 ---0zh9uzSE 1184 gadgets 6 0
3 3 3 ---0zh9uzSE 1184 gadgets 2 0
4 4 4 ---0zh9uzSE 1184 gadgets 1 0
5 5 5 ---0zh9uzSE 1184 gadgets 1 0
6 6 6 ---0zh9uzSE 1184 gadgets 3 0
small <-
structure(list(X1 = 1:6,
X1_1 = 1:6,
Video.ID = c("---0zh9uzSE", "---0zh9uzSE", "---0zh9uzSE", "---0zh9uzSE", "---0zh9uzSE", "---0zh9uzSE"),
Video.Duration..sec. = c(1184L, 1184L, 1184L, 1184L, 1184L, 1184L),
Category = c("gadgets", "gadgets", "gadgets", "gadgets", "gadgets", "gadgets"),
Owned.Views = c(6L, 6L, 2L, 1L, 1L, 3L),
Partner.Revenue = c(0, 0, 0, 0, 0, 0)),
row.names = c(NA, -6L),
class = c("tbl_df", "tbl", "data.frame"))
答案 0 :(得分:4)
至少有两种方法可以解决这个问题:
将“类别”列添加到group_by
:
small %>%
group_by(Video.ID, cat = Category) %>%
summarise(sumr = sum(Partner.Revenue),
len = mean(Video.Duration..sec.))
# A tibble: 1 x 4
# Groups: Video.ID [?]
# Video.ID cat sumr len
# <chr> <chr> <dbl> <dbl>
# 1 ---0zh9uzSE gadgets 0 1184
或使用unique(Catregory)
:
small %>%
group_by(Video.ID) %>%
summarise(sumr = sum(Partner.Revenue),
len = mean(Video.Duration..sec.),
cat = unique(Category))
# A tibble: 1 x 4
# Video.ID sumr len cat
# <chr> <dbl> <dbl> <chr>
# 1 ---0zh9uzSE 0 1184 gadgets
可能会推荐第一个选项,因为如果每个ID都有多个类别,它仍然可以使用。
答案 1 :(得分:1)
由于它是每个video_id
的唯一类别,因此您可以拥有cat = Category[1]
,如
small %>% group_by(Video.ID) %>%
summarise(sumr=sum(Partner.Revenue), len = mean(Video.Duration..sec.),
cat = Category[1])
# A tibble: 1 x 4
# Video.ID sumr len cat
# <chr> <dbl> <dbl> <chr>
# 1 ---0zh9uzSE 0 1184 gadgets