Question

我正在尝试在2k列和1m行的大型数据帧中提取有意义的行，其中许多列的值等于零。有意义的定义是，大多数列都是非零值。

作为玩具示例

difference_from_mean <- data.frame('col1' = c(-1, 1.5, -1, 1.2, 1), 'col2' = c(1, -0.5, 0, -4, 0), 'col3' = c(0, 1, 0, 1, 0), 'col4' = c(0, 0, 2, 1, 0))

difference_from_mean
  col1 col2 col3 col4
1 -1.0  1.0    0    0
2  1.5 -0.5    1    0
3 -1.0  0.0    0    2
4  1.2 -4.0    1    1
5  1.0  0.0    0    0

希望获得结果

> difference_from_mean_filtered
  col1 col2 col3 col4
1 -1.0  1.0    0    0
2  1.5 -0.5    1    0
3 -1.0  0.0    0    2
4  1.2 -4.0    1    1

我尝试了rowSums，但由于值可能为负，导致许多原始数据为零或接近零，因此无法正常工作。以上只是一个玩具示例。我正在寻找将零的列数放入新列，这将有助于对df进行子集设置（尝试匹配字符串，并且由于从小数开始计数为0，因此它也不起作用）。

选择数据框中的行几乎都是非零的

0 个答案: