我有一个这样的数据框:
df1 <- read.table(header=T, text="dataset stock Google Yahoo GTM Microsoft
dataset1 stock1 1 1 1 0
dataset1 stock2 1 0 0 0
dataset1 stock3 1 1 0 0
dataset2 stock1 0 1 1 1
dataset2 stock2 0 0 1 0
dataset3 stock2 1 1 1 0")
我想像这样得到行和列的总和:
并删除列总和等于或小于1的列
dataset stock Google Yahoo GTM Microsoft sum_row
dataset1 stock1 1 1 1 0 3
dataset1 stock2 1 0 0 0 1
dataset1 stock3 1 1 0 0 2
dataset2 stock1 0 1 1 1 3
dataset2 stock2 0 0 1 0 1
dataset3 stock2 1 1 1 0 3
sum_col sum_col 4 4 4 1
并删除sum_col等于或小于1的列。
答案 0 :(得分:2)
我们可以使用addmargins
创建一个&#39; Sum&#39;的列/行。对于转换为matrix
后的数字列。然后,删除sum
小于2的列。
d1 <- addmargins(`row.names<-`(as.matrix(df1[-(1:2)]), 1:nrow(df1)))
d1[,d1[nrow(d1),]>1]
# Google Yahoo GTM Sum
#1 1 1 1 3
#2 1 0 0 1
#3 1 1 0 2
#4 0 1 1 3
#5 0 0 1 1
#6 1 1 1 3
#Sum 4 4 4 13
或另一个选项是rowSums/colSums
sum_row <- rowSums(df1[-(1:2)])
sum_col <- colSums(df1[-(1:2)])
df1[1:2] <- lapply(df1[1:2], as.character)
dfN <- rbind(df1[1:2], list('sum_col', 'sum_col'))
dfV <- rbind(df1[-(1:2)], as.list(sum_col))
res <- cbind(dfN, dfV, sum_row=c(sum_row, sum(sum_col)))
res[setdiff(names(res), names(which(sum_col<2)))]
答案 1 :(得分:2)
在rowSums
包中使用colSums
和rbind.fill
以及plyr
,
df1$rowsums <- rowSums(df1[,-(1:2)])
df1 <- rbind.fill(df1, as.data.frame(t(colSums(df1[,-(1:2)]))))
df1
# dataset stock Google Yahoo GTM Microsoft rowsums
#1 dataset1 stock1 1 1 1 0 3
#2 dataset1 stock2 1 0 0 0 1
#3 dataset1 stock3 1 1 0 0 2
#4 dataset2 stock1 0 1 1 1 3
#5 dataset2 stock2 0 0 1 0 1
#6 dataset3 stock2 1 1 1 0 3
#7 <NA> <NA> 4 4 4 1 13