删除少于4个非零条目的行,而不使用循环

时间:2013-10-23 14:05:41

标签: r count vectorization zero

数据集如下:

"1" 10 40 "r" "q" "0" "r" "r" "0" "r" "0" "0" "0" "0" "0" "t" "q" "0" "0" "s" "0" "r" 0 "0" 0 "0" "0" 0 0 0 "0"
"2" 10 173 "s" "s" "s" "0" "0" "s" "s" "0" "t" "t" "s" "t" "t" "r" "s" "0" "q" "0" "0" 0 "0" 0 "0" "0" 0 0 0 "0"
"3" 10 2107 "t" "0" "0" "s" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" 0 "0" 0 "0" "0" 0 0 0 "0"
"4" 10 993 "s" "0" "q" "s" "s" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" 0 "0" 0 "0" "0" 0 0 0 "0"
"5" 10 1712 "t" "0" "s" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "s" "0" "t" "0" 0 "0" 0 "0" "0" 0 0 0 "0"
"6" 776 1872 "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" 0 "r" 0 "0" "0" 0 0 0 "s"

输出应为:

"1" 10 40 "r" "q" "0" "r" "r" "0" "r" "0" "0" "0" "0" "0" "t" "q" "0" "0" "s" "0" "r" 0 "0" 0 "0" "0" 0 0 0 "0"
"2" 10 173 "s" "s" "s" "0" "0" "s" "s" "0" "t" "t" "s" "t" "t" "r" "s" "0" "q" "0" "0" 0 "0" 0 "0" "0" 0 0 0 "0"
"4" 10 993 "s" "0" "q" "s" "s" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" 0 "0" 0 "0" "0" 0 0 0 "0"
"5" 10 1712 "t" "0" "s" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "s" "0" "t" "0" 0 "0" 0 "0" "0" 0 0 0 "0"

我尝试的代码是:

x=read.table("sample.txt")
nrowx=nrow(x) 
for(i in 1:nrowx)
{
    count=0
    for(j in 3:30)
    {
        if(x[i,j]!=0)
        count = count+1
    }   
    if(count<4)
    x[i,]=NA    
}  
x=x[complete.cases(x),]

请建议一些不涉及循环的方法。

1 个答案:

答案 0 :(得分:1)

看起来您的行中没有任何行包含少于四个非零条目:

例如,打印每行非零条目的数量,tab为您的表格:

apply(tab, 1, function(x)sum(x!="0"))
 [1] 12 16  5  7  7  5

例如,要删除少于5个非零项的所有行,您可以

tab[-which(apply(tab, 1, function(x)sum(x!="0"))<=5),]

但是,我不确定数据中的第一列是否被视为数据框中的列。

这有帮助吗?