我正在尝试在R中创建一个函数,它允许我根据行是否包含一个零的单个列来过滤我的数据集。此外,有时我只想删除所有列中为零的行。
此外,这是它变得有趣的地方;并非所有列都包含数字,列数可能会有所不同。
我试图将我的一些数据粘贴到我想要获得的结果中。
unfiltered:
ID GeneName DU145small DU145total PC3small PC3total
1 MIR22HG 33221.5 1224.55 2156.43 573.315
2 MIRLET7E 87566.1 7737.99 25039.3 16415.6
3 MIR612 0 0 530.068 0
4 MIR218-1 0 0 1166.88 701.253
5 MIR181B2 70723.2 3958.01 6209.85 1399.34
6 MIR218-2 0 0 0 0
7 MIR10B 787.516 330.556 0 20336.4
8 MIR3176 0 0 0 0
any rows with containing a zero removed:
ID GeneName DU145small DU145total PC3small PC3total
1 MIR22HG 33221.5 1224.55 2156.43 573.315
2 MIRLET7E 87566.1 7737.99 25039.3 16415.6
5 MIR181B2 70723.2 3958.01 6209.85 1399.34
only rows that is all zero is filtered away:
ID GeneName DU145small DU145total PC3small PC3total
1 MIR22HG 33221.5 1224.55 2156.43 573.315
2 MIRLET7E 87566.1 7737.99 25039.3 16415.6
3 MIR612 0 0 530.068 0
4 MIR218-1 0 0 1166.88 701.253
5 MIR181B2 70723.2 3958.01 6209.85 1399.34
7 MIR10B 787.516 330.556 0 20336.4
我确实找到了一种方法来删除任何至少有1个零的行,但它是"作弊"通过用NA交换全零,然后使用complete.cases进行过滤。
此外,通过这样做,它会删除GeneName
中有零的所有行(与MIR10B一样)。
我可以通过使用for循环解决它,但我被告知R中的循环非常无效,因此希望避免这种解决方案。
编辑:虽然辛寅的解决方案运作良好并且将数据保存在数据框中,但David Arenburg的答案应该更高效,应该使用。答案 0 :(得分:10)
使用data.table
(假设df
是您的数据集)
library(data.table)
setDT(df)[, .SD[!all(.SD[, -1, with = F] == 0)], by = GeneName]
# GeneName ID DU145small DU145total PC3small PC3total
# 1: MIR22HG 1 33221.500 1224.550 2156.430 573.315
# 2: MIRLET7E 2 87566.100 7737.990 25039.300 16415.600
# 3: MIR612 3 0.000 0.000 530.068 0.000
# 4: MIR218-1 4 0.000 0.000 1166.880 701.253
# 5: MIR181B2 5 70723.200 3958.010 6209.850 1399.340
# 6: MIR10B 7 787.516 330.556 0.000 20336.400
或者,如果您只想删除任意零的行
setDT(df)[, .SD[!any(.SD[, -1, with = F] == 0)], by = GeneName]
# GeneName ID DU145small DU145total PC3small PC3total
# 1: MIR22HG 1 33221.5 1224.55 2156.43 573.315
# 2: MIRLET7E 2 87566.1 7737.99 25039.30 16415.600
# 3: MIR181B2 5 70723.2 3958.01 6209.85 1399.340
答案 1 :(得分:5)
删除任意零的行:
df[!rowSums(df[-c(1:2)] == 0) >= 1,]
删除全部为零的行:
df[!!rowSums(abs(df[-c(1:2)])),]
答案 2 :(得分:4)
在列子集上使用rowSums
,请尝试以下操作:
#dummy data
df <- read.table(text="
ID GeneName DU145small DU145total PC3small PC3total
1 MIR22HG 33221.5 1224.55 2156.43 573.315
2 MIRLET7E 87566.1 7737.99 25039.3 16415.6
3 MIR612 0 0 530.068 0
4 MIR218-1 0 0 1166.88 701.253
5 MIR181B2 70723.2 3958.01 6209.85 1399.34
6 MIR218-2 0 0 0 0
7 MIR10B 787.516 330.556 0 20336.4
8 MIR3176 0 0 0 0",
header=TRUE)
#remove any zero
df[ !rowSums(df[,colnames(df)[(3:ncol(df))]]==0)>=1, ]
#remove all zero
df[ !rowSums(df[,colnames(df)[(3:ncol(df))]]==0)==ncol(df)-2, ]
答案 3 :(得分:1)
这样可行
> (unfiltered <- read.table(text="
+ ID GeneName DU145small DU145total PC3small PC3total
+ 1 MIR22HG 33221.5 1224.55 2156.43 573.315
+ 2 MIRLET7E 87566.1 7737.99 25039.3 16415.6
+ 3 MIR612 0 0 530.068 0
+ 4 MIR218-1 0 0 1166.88 701.253
+ 5 MIR181B2 70723.2 3958.01 6209.85 1399.34
+ 6 MIR218-2 0 0 0 0
+ 7 MIR10B 787.516 330.556 0 20336.4
+ 8 MIR3176 0 0 0 0
+ ", header=T))
ID GeneName DU145small DU145total PC3small PC3total
1 1 MIR22HG 33221.500 1224.550 2156.430 573.315
2 2 MIRLET7E 87566.100 7737.990 25039.300 16415.600
3 3 MIR612 0.000 0.000 530.068 0.000
4 4 MIR218-1 0.000 0.000 1166.880 701.253
5 5 MIR181B2 70723.200 3958.010 6209.850 1399.340
6 6 MIR218-2 0.000 0.000 0.000 0.000
7 7 MIR10B 787.516 330.556 0.000 20336.400
8 8 MIR3176 0.000 0.000 0.000 0.000
>
> (any.zero <- unfiltered[!apply(unfiltered[, -c(1,2)], 1, function(row) any(row == 0)), ])
ID GeneName DU145small DU145total PC3small PC3total
1 1 MIR22HG 33221.5 1224.55 2156.43 573.315
2 2 MIRLET7E 87566.1 7737.99 25039.30 16415.600
5 5 MIR181B2 70723.2 3958.01 6209.85 1399.340
> (all.zero <- unfiltered[!apply(unfiltered[, -c(1,2)], 1, function(row) all(row == 0)), ])
ID GeneName DU145small DU145total PC3small PC3total
1 1 MIR22HG 33221.500 1224.550 2156.430 573.315
2 2 MIRLET7E 87566.100 7737.990 25039.300 16415.600
3 3 MIR612 0.000 0.000 530.068 0.000
4 4 MIR218-1 0.000 0.000 1166.880 701.253
5 5 MIR181B2 70723.200 3958.010 6209.850 1399.340
7 7 MIR10B 787.516 330.556 0.000 20336.400