概述

Question

我正在尝试从现有数据集中创建一个新数据集。新数据集应该合并原始数据集中的60行，以便将每秒发生的事件总数转换为分钟总数。列数通常通常不会事先知道。

例如，对于此数据集，如果我们将其分为3行：

我们将获得此data.frame。第1行包含d1的第1-3行的列总和，第2行包含d1的第4-6行的列总和：

我已经尝试过d2<-colSums(d1[seq(1,NROW(d1),3),])，它与我所能达到的程度差不多。

我还考虑了How to sum rows based on multiple conditions - R?，How to select every xth row from table，Remove last N rows in data frame with the arbitrary number of rows，sum two columns in R和Merging multiple rows into single row的推荐。我全都没主意了。任何帮助将不胜感激。

Answer 1

概述

阅读Split up a dataframe by number of rows之后，我意识到您唯一需要了解的就是split() id inst_name_1 inst_name_2 inst_name_3 inst_state_1 inst_state_2 inst_state_3 level_1 1: 1 community college 1 univ 1 univ 2 CA CA CA Associate of Applied Sciences level_2 level_3 deg_maj_1_1 deg_maj_1_2 1: Bachelors of Applied Sciences Masters of Applied Sciences NETWORK SECURITY INFO ASSUR CYBR-SECURITY deg_maj_1_3 deg_cip_1_1 deg_cip_1_2 deg_cip_1_3 deg_maj_2_1 deg_maj_2_2 deg_maj_2_3 deg_cip_2_1 deg_cip_2_2 1: CISCO CCNA PREPARATION 111003 520299 111003 NA NA NA NA NA deg_cip_2_3 deg_maj_3_1 deg_maj_3_2 deg_maj_3_3 deg_cip_3_1 deg_cip_3_2 deg_cip_3_3 deg_maj_4_1 deg_maj_4_2 deg_maj_4_3 1: NA NA NA NA NA NA NA NA NA NA deg_cip_4_1 deg_cip_4_2 deg_cip_4_3 1: NA NA NA的方式。

在这种情况下，您想基于每3行将d1分成多个数据帧。在这种情况下，您可以使用rep()指定希望序列中的每个元素-d1-重复三遍（行数除以序列的长度）。

此后，逻辑涉及使用map()对1:2之后创建的每个数据帧的每一列求和。在这里，summarize_all()很有帮助，因为您不需要提前知道列名。

计算完成后，您可以使用d1 %>% split()将所有观察结果堆叠回一个数据帧。

bind_rows()

Answer 2

创建一个分组变量，先`group_by`，然后再`summarise_all`。

# your data
d <- data.frame(a = c(1,0,0,0,0,1),
                b = c(1,1,1,0,0,0),
                c = c(0,0,0,1,1,1),
                d = c(1,1,0,0,0,0))

# create the grouping variable 
d$group <- rep(c("A","B"), each = 3)

# apply the mean to all columns
library(dplyr)
d %>% 
  group_by(group) %>% 
  summarise_all(funs(sum))

返回：

# A tibble: 2 x 5
  group     a     b     c     d
  <chr> <dbl> <dbl> <dbl> <dbl>
1 A         1     3     0     2
2 B         1     0     3     0

按行计数求和一组列

2 个答案:

概述

创建一个分组变量，先`group_by`，然后再`summarise_all`。

按行计数求和一组列

2 个答案:

概述

创建一个分组变量，先group_by，然后再summarise_all。

创建一个分组变量，先`group_by`，然后再`summarise_all`。