获取列的总和并使用阈值来删除

时间:2016-03-08 07:24:36

标签: r

我有一个这样的数据框:

df1 <- read.table(header=T, text="dataset  stock  Google Yahoo GTM Microsoft
dataset1 stock1 1       1     1    0
dataset1 stock2 1       0     0    0
dataset1 stock3 1       1     0    0
dataset2 stock1 0       1     1    1
dataset2 stock2 0       0     1    0
dataset3 stock2 1       1     1    0")

我想像这样得到行和列的总和:

并删除列总和等于或小于1的列

   dataset  stock  Google Yahoo GTM Microsoft sum_row
    dataset1 stock1 1       1     1    0     3
    dataset1 stock2 1       0     0    0     1
    dataset1 stock3 1       1     0    0     2
    dataset2 stock1 0       1     1    1     3
    dataset2 stock2 0       0     1    0     1
    dataset3 stock2 1       1     1    0     3
    sum_col  sum_col 4      4     4    1

并删除sum_col等于或小于1的列。

2 个答案:

答案 0 :(得分:2)

我们可以使用addmargins创建一个&#39; Sum&#39;的列/行。对于转换为matrix后的数字列。然后,删除sum小于2的列。

d1 <- addmargins(`row.names<-`(as.matrix(df1[-(1:2)]), 1:nrow(df1)))
d1[,d1[nrow(d1),]>1]
 #     Google Yahoo GTM Sum
 #1        1     1   1   3
 #2        1     0   0   1
 #3        1     1   0   2
 #4        0     1   1   3
 #5        0     0   1   1
 #6        1     1   1   3
 #Sum      4     4   4  13

或另一个选项是rowSums/colSums

 sum_row <- rowSums(df1[-(1:2)])
 sum_col <- colSums(df1[-(1:2)])

 df1[1:2] <- lapply(df1[1:2], as.character)
 dfN <- rbind(df1[1:2], list('sum_col', 'sum_col'))
 dfV <- rbind(df1[-(1:2)], as.list(sum_col))

 res <- cbind(dfN, dfV, sum_row=c(sum_row, sum(sum_col)))
 res[setdiff(names(res), names(which(sum_col<2)))]

答案 1 :(得分:2)

rowSums包中使用colSumsrbind.fill以及plyr

df1$rowsums <- rowSums(df1[,-(1:2)])
df1 <- rbind.fill(df1, as.data.frame(t(colSums(df1[,-(1:2)]))))
df1
#   dataset  stock Google Yahoo GTM Microsoft rowsums
#1 dataset1 stock1      1     1   1         0       3
#2 dataset1 stock2      1     0   0         0       1
#3 dataset1 stock3      1     1   0         0       2
#4 dataset2 stock1      0     1   1         1       3
#5 dataset2 stock2      0     0   1         0       1
#6 dataset3 stock2      1     1   1         0       3
#7     <NA>   <NA>      4     4   4         1      13