取数据框中每3列的行总和

时间:2019-01-07 06:10:05

标签: r dataframe

我搜索过高和低,还尝试了多种方法来解决此问题,但未获得所需的输出,如下所述:

我有数据帧df3,其标头为日期,值在0-1之间,如下所示:

df = data.frame(replicate(6,sample(0:1,6,rep=TRUE)))
colnames(df) = c("1/1/2018","1/2/2018","1/3/2018","1/4/2018","1/5/2018","1/6/2018")
df2 = data.frame(c("A","B","C","D","E","F"))
colnames(df2) = c("CUST_ID")
df3 = cbind(df2,df)

enter image description here

现在我需要df4,其中串联的前3列之和将形成一列。对于其余的列,将动态重复此操作。

df4

enter image description here

我尝试过的选项:

a) rbind.data.frame(apply(matrix(df3, nrow = n - 1), 1,sum))

b) col_list <- list(c("1/1/2018","1/2/2018","1/3/2018"), c("1/4/2018","1/5/2018","1/6/2018"))

lapply(col_list, function(x)sum(df3[,x])) %>% data.frame

2 个答案:

答案 0 :(得分:0)

一种方法是使用df3每3列拆分split.default。要拆分数据,我们使用rep生成一个序列,然后对于每个数据帧,我们将rowSums并最终cbind合并在一起。

cbind(df3[1], sapply(split.default(df3[-1],  
         rep(1:ncol(df3), each = 3, length.out = (ncol(df3) -1))), rowSums))



#  CUST_ID 1 2
#1       A 1 1
#2       B 2 0
#3       C 2 1
#4       D 1 1
#5       E 2 2
#6       F 2 2

仅供参考,从rep生成的序列为

rep(1:ncol(df3), each = 3, length.out = (ncol(df3) -1))
#[1] 1 1 1 2 2 2

这使得可以每3列拆分一次。

结果有所不同,因为OP使用sample而不使用set.seed


如果rep似乎太长,那么我们可以使用gl

生成相同的列序列
gl(ncol(df3[-1])/3, 3)
#[1] 1 1 1 2 2 2
#Levels: 1 2

所以最终的代码应该是

cbind(df3[1], sapply(split.default(df3[-1], gl(ncol(df3[-1])/3, 3)), rowSums))

答案 1 :(得分:0)

我们可以使用seq创建索引,通过求和获得listReduce中的列子集,并创建新列

df4 <- df3[1]
df4[paste0('col', c('123', '456'))] <- lapply(seq(2, ncol(df3), by = 3), 
                   function(i) Reduce(`+`, df3[i:min((i+2), ncol(df3))]))
df4
#  CUST_ID col123 col456
#1       A      2      2
#2       B      3      3
#3       C      1      3
#4       D      2      3
#5       E      2      1
#6       F      0      1

数据

set.seed(123)
df  <- data.frame(replicate(6,sample(0:1,6,rep=TRUE)))
colnames(df) <- c("1/1/2018","1/2/2018","1/3/2018","1/4/2018","1/5/2018","1/6/2018")
df2 <-  data.frame(c("A","B","C","D","E","F"))
colnames(df2) = c("CUST_ID")
df3 <- cbind(df2, df)