假设我有这样的虚拟数据:
x1 x2 x3 x4 x5 x6
26 25 30 0 23 27
24 0 26 0 22 30
20 0 24 0 21 21
27 0 26 0 27 25
22 0 0 0 28 22
20 0 0 0 24 20
22 0 0 0 20 27
22 0 0 0 23 28
30 0 0 0 27 24
23 0 0 0 24 22
26 0 0 0 26 26
I need clean this data.
1. delete all сolumns with zero values (for eample x4)
2. delete all сolumns with the number of non-zero values less than 5(x2-x3).
是否可以编写该函数或循环?
答案 0 :(得分:5)
试试这个:
# sample data
x1 <- c(0,2,5,7,2,3,0,3)
x2 <- c(2,3,0,0,1,0,4,0)
x3 <- c(0,0,0,0,0,0,0,0)
x4 <- c(2,5,1,2,3,4,5,6)
df <- data.frame(x1,x2,x3,x4)
df <- df[,!colSums(df != 0) < 5]
#same result, it's just the logic that is inversed
df <- df[,colSums(df != 0) >= 5]
所以这个数据框
> df
x1 x2 x3 x4
1 0 2 0 2
2 2 3 0 5
3 5 0 0 1
4 7 0 0 2
5 2 1 0 3
6 3 0 0 4
7 0 4 0 5
8 3 0 0 6
成为这个:
> df
x1 x4
1 0 2
2 2 5
3 5 1
4 7 2
5 2 3
6 3 4
7 0 5
8 3 6