R)如何删除空值的“行”?

时间:2014-07-24 21:56:50

标签: r dataframe

我有一个包含很多空值的奇怪数据。

test=read.table("test.csv", sep=",", header=T)
class(test)
[1] "data.frame"
test[1:5]
     GO.0000075 GO.0000077 GO.0000082 GO.0002474 GO.0002478
1       CDC27      FEM1B       CUL2       CTSS      AP2A2
2       FEM1B      PSMA1      PSMA1      ITGAV       CTSS
3        NAE1      PSMA3      PSMA3      PSMA1     DYNLL1
4       PSMA1      PSMB5      PSMB5      PSMA3      ITGAV
5       PSMA3      PSMC1      PSMC1      PSMB5      KIF5A
6       PSMB5      PSMC5      PSMC5      PSMC1     KIFAP3
7       PSMC1      PSMC6      PSMC6      PSMC5      PSMA1
8       PSMC5      PSMD1      PSMD1      PSMC6      PSMA3
9       PSMC6     PSMD12     PSMD12      PSMD1      PSMB5
10      PSMD1     PSMD13     PSMD13     PSMD12      PSMC1
11     PSMD12     PSMD14     PSMD14     PSMD13      PSMC5
12     PSMD13      PSMD4      PSMD4     PSMD14      PSMC6
13     PSMD14      PSME3      PSME3      PSMD4      PSMD1
14      PSMD4     PTPN11                 PSME3     PSMD12
15      PSME3                                      PSMD13
16     PTPN11                                      PSMD14
17                                                  PSMD4
18                                                  PSME3
19                                                       
20                                                       
21                                                       
22                                                       
23                                                       
24                                                       
25                                                       
26                                                       
27                                                       
28                                                       
29                                                       
30                                                       
31                                                       
32                                                       
33                                                       
34         
nrow(test[1])
[1] 34

## I want to get the number of column with any value: that is,16     
## So, I tried to remove empty columns like this

test2<-test[-which(is.na(test)),]
test2
[1] GO.0000075 GO.0000077 GO.0000082 GO.0002474 GO.0002478 GO.0002479 GO.0006006 GO.0006007   ...
## another way..
test[test==""] <- NA
test
GO.0000075 GO.0000077 GO.0000082 GO.0002474 GO.0002478 GO.0002479 GO.0006006 GO.0006007
1       CDC27      FEM1B       CUL2       CTSS      AP2A2      ITGAV      ALDOA          ALDOA
2       FEM1B      PSMA1      PSMA1      ITGAV       CTSS      PSMA1     ARPP19       ENO2
3        NAE1      PSMA3      PSMA3      PSMA1     DYNLL1      PSMA3       ENO2        GPI
4       PSMA1      PSMB5      PSMB5      PSMA3      ITGAV      PSMB5       GOT1        HK2
5       PSMA3      PSMC1      PSMC1      PSMB5      KIF5A      PSMC1       GOT2       IGF1
6       PSMB5      PSMC5      PSMC5      PSMC1     KIFAP3      PSMC5        GPI       LDHA
7       PSMC1      PSMC6      PSMC6      PSMC5      PSMA1      PSMC6        HK2       PFKP
8       PSMC5      PSMD1      PSMD1      PSMC6      PSMA3      PSMD1       IGF1      PGAM1
9       PSMC6     PSMD12     PSMD12      PSMD1      PSMB5     PSMD12       LDHA       TPI1
10      PSMD1     PSMD13     PSMD13     PSMD12      PSMC1     PSMD13       MDH1       <NA>
11     PSMD12     PSMD14     PSMD14     PSMD13      PSMC5     PSMD14       PFKP       <NA>
12     PSMD13      PSMD4      PSMD4     PSMD14      PSMC6      PSMD4      PGAM1       <NA>
13     PSMD14      PSME3      PSME3      PSMD4      PSMD1      PSME3     RANBP2       <NA>
14      PSMD4     PTPN11       <NA>      PSME3     PSMD12       <NA>       TPI1       <NA>
15      PSME3       <NA>       <NA>       <NA>     PSMD13       <NA>       <NA>       <NA>
16     PTPN11       <NA>       <NA>       <NA>     PSMD14       <NA>       <NA>       <NA>
17       <NA>       <NA>       <NA>       <NA>      PSMD4       <NA>       <NA>       <NA>
18       <NA>       <NA>       <NA>       <NA>      PSME3       <NA>       <NA>       <NA>
19       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
20       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
21       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
22       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
23       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
24       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
25       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
26       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
27       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
28       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
29       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
30       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
31       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
32       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
33       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
34       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>

test<-na.omit(test)
test
GO.0000075 GO.0000077 GO.0000082 GO.0002474 GO.0002478 GO.0002479 GO.0006006 GO.0006007
1      CDC27      FEM1B       CUL2       CTSS      AP2A2      ITGAV      ALDOA      ALDOA
2      FEM1B      PSMA1      PSMA1      ITGAV       CTSS      PSMA1     ARPP19       ENO2
3       NAE1      PSMA3      PSMA3      PSMA1     DYNLL1      PSMA3       ENO2        GPI
  GO.0006091 GO.0006094 GO.0006096 GO.0006099 GO.0006106 GO.0006119 GO.0006120 GO.0006418
1      ACACB      ALDOA      ALDOA         FH         FH       BDNF       BDNF       KARS
2      ALDOA     ARPP19       ENO2      IDH3A       GOT1     NDUFA9     NDUFA9       NARS
3     ATP5A1       ENO2        GPI       LDLR       GOT2    NDUFAF1    NDUFAF1       PPA1

我还尝试排除Blank并使用complete.cases函数获取具有值的行数(例如row(test [1])= 16)。但它只是给了我相同的结果。

我应该做什么?

2 个答案:

答案 0 :(得分:0)

尝试这样的事情

test[rowSums(is.na(test))!=ncol(test), ] # first set blank to NA

test[rowSums(test=="")!=ncol(test), ]

答案 1 :(得分:0)

在读完R后,我得到了一个CSV文件,在底部有许多空白行。

一种似乎对我有用的解决方案是选择一个我知道永远不会有空白的列,然后对此进行过滤:

No data found!