Question

我有一个包含以下列的数据框：

Timestamp - POSIXct
Number of users - integer
number of schools - integer
country code - factor

Screenshot

我想做的是为整个数据框创建一个新列，以按时间戳和国家/地区代码对用户数量的总和进行分组。因此，例如对于时间戳记A（2019-03-01），国家x的用户总数为...，国家y的用户总数为...。

我尝试使用dplyr包，尤其是mutate函数，但是以某种方式无法正常工作。

我尝试使用ggplot和stat_summary参数，但是以某种方式ggplot告诉我的不是总金额，而是每个时间戳记每个国家/地区的用户数。

样本输出的输出如下：

structure(list(date_intervall = structure(c(1559340000, 1559340000, 
1559340000, 1559340000, 1561932000, 1561932000, 1561932000, 1561932000, 
1561932000, 1561932000, 1561932000, 1564610400, 1564610400, 1564610400, 
1564610400, 1564610400, 1564610400, 1564610400, 1567288800, 1567288800, 
1567288800, 1567288800, 1567288800, 1567288800, 1567288800), class = c("POSIXct", 
"POSIXt"), tzone = ""), number_of_students = c(28470L, 28L, 54L, 
754L, 1376L, 2299L, 2632L, 28470L, 28L, 68L, 1003L, 1380L, 2299L, 
3584L, 28470L, 28L, 69L, 1003L, 1384L, 2350L, 5078L, 28470L, 
28L, 72L, 1003L), number_of_schools = c(66L, 1L, 2L, 1L, 6L, 
4L, 10L, 66L, 1L, 3L, 1L, 6L, 4L, 15L, 66L, 1L, 3L, 1L, 6L, 4L, 
22L, 66L, 1L, 3L, 1L), country_code = structure(c(3L, 3L, 4L, 
5L, 1L, 2L, 2L, 3L, 3L, 4L, 5L, 1L, 2L, 2L, 3L, 3L, 4L, 5L, 1L, 
2L, 2L, 3L, 3L, 4L, 5L), .Label = c("AU", "ID", "PL", "SG", "VN"
), class = "factor")), row.names = 86:110, class = "data.frame")

Answer 1

我仍然无法理解问题，请尝试以下方法。

首先，加载所需的软件包。

library(dplyr)
library(ggplot2)

现在，如果要在原始数据集中添加新列，请使用mutate而不是summarise，然后将结果分配回d。

d <- d %>%
  group_by(date_intervall, country_code) %>%
  mutate(total_students = sum(number_of_students))

要绘制总计，请使用summarise并将其管道传输到ggplot。请注意，我使用geom_col而不是geom_bar进行绘制。

d %>%
  group_by(date_intervall, country_code) %>%
  summarise(total_students = sum(number_of_students)) %>%
  ggplot(aes(x = date_intervall, y = total_students, fill = country_code)) + 
  geom_col()

按时间戳分组R

1 个答案: