如何判断一个因子是否没有值R.

时间:2016-01-30 08:33:10

标签: r

以下是数据外观的示例。我需要用NA替换所有这些空格,以便as.Date(dat[,i])不会产生错误

> dat[,i]
   [1]                                                                                                                                                                                                                                                                                                         
  [28]                                                                                                                                                                                                                                                                                                         
  [55]                                                                                                                                                                                                                                                                                                         
  [82]                                                                                                    6/26/2007             7/5/2007                         7/5/2007                                                                     12/6/2007                                              2/5/2008  
 [109]            3/27/2008                        6/29/2008  9/16/2008  11/3/2008                                                                                          9/11/2008  11/24/2008 12/29/2008 11/20/2008 1/26/2009  1/8/2009                                               3/5/2009             
 [136] 4/7/2009              6/9/2009   8/23/2009  8/16/2009             9/2/2009              10/6/2009  10/14/2009 10/24/2009 10/22/2009 11/5/2009                        12/9/2009                        2/4/2010                                                          3/18/2010                       
 [163]            7/8/2010   7/7/2010   7/29/2010             10/6/2010  10/7/2010  11/18/2010                       1/12/2011  1/6/2011                                    4/5/2011   4/21/2011             5/25/2011             6/20/2011                                                                   
 [190]                       12/12/2011 2/29/2012             2/22/2012  3/7/2012              3/28/2012             5/16/2012  5/23/2012  6/14/2012                                              8/14/2012  8/16/2012  9/5/2012   9/30/2012  11/5/2012                        12/25/2012 12/27/2012 3/14/2013 
 [217]                                                        7/24/2013  7/31/2013             9/2/2013   10/16/2013            10/30/2013                                  12/13/2013            2/24/2014  3/9/2014                                               6/29/2014  6/23/2014                       
 [244]                       9/1/2014   9/22/2014  9/22/2014  11/23/2014            2/24/2015             3/17/2015  4/8/2015                         6/23/2015  6/23/2015  7/4/2015                                                                                                                           
 [271]                                                                                                                                                          ...                                                                                                                                               

[3538]                       6/29/2012  11/16/2012 11/23/2012 9/1/2012                        
916 Levels:   10/10/2008 10/10/2009 10/10/2012 10/11/2010 10/11/2013 10/1/2010 10/12/2009 10/14/2009 10/14/2010 10/14/2011 10/14/2014 10/15/2009 10/15/2014 10/16/2013 10/17/2011 10/19/2009 10/19/2010 10/19/2011 10/20/2012 10/21/2008 10/21/2010 10/21/2013 10/2/2010 10/2/2012 10/2/2013 ... 9/9/2014

但是其中的每个单元都具有相同的数据类型 - “因素”。对于dat[,i][1] == ""dat[,i][1]dat[,i][3511]都会返回false,那么我应该如何区分它们以便我可以恰当地使用apply将NA放在需要去的地方?

> dat[,i][1]
[1]  
916 Levels:   10/10/2008 10/10/2009 10/10/2012 10/11/2010 10/11/2013 10/1/2010 10/12/2009 10/14/2009 10/14/2010 10/14/2011 10/14/2014 10/15/2009 10/15/2014 10/16/2013 10/17/2011 10/19/2009 10/19/2010 10/19/2011 10/20/2012 10/21/2008 10/21/2010 10/21/2013 10/2/2010 10/2/2012 10/2/2013 ... 9/9/2014
> class(dat[,i][1])
[1] "factor"

> dat[,i][3511]
[1] 2/20/2012
916 Levels:   10/10/2008 10/10/2009 10/10/2012 10/11/2010 10/11/2013 10/1/2010 10/12/2009 10/14/2009 10/14/2010 10/14/2011 10/14/2014 10/15/2009 10/15/2014 10/16/2013 10/17/2011 10/19/2009 10/19/2010 10/19/2011 10/20/2012 10/21/2008 10/21/2010 10/21/2013 10/2/2010 10/2/2012 10/2/2013 ... 9/9/2014
> class(dat[,i][3511])
[1] "factor"

此外,试图“降低水平”没有任何作用,仍然只是一个因素:

> dat[,i][[1]]
[1]  
916 Levels:   10/10/2008 10/10/2009 10/10/2012 10/11/2010 10/11/2013 10/1/2010 10/12/2009 10/14/2009 10/14/2010 10/14/2011 10/14/2014 10/15/2009 10/15/2014 10/16/2013 10/17/2011 10/19/2009 10/19/2010 10/19/2011 10/20/2012 10/21/2008 10/21/2010 10/21/2013 10/2/2010 10/2/2012 10/2/2013 ... 9/9/2014

> dat[,i][1][1]
[1]  
916 Levels:   10/10/2008 10/10/2009 10/10/2012 10/11/2010 10/11/2013 10/1/2010 10/12/2009 10/14/2009 10/14/2010 10/14/2011 10/14/2014 10/15/2009 10/15/2014 10/16/2013 10/17/2011 10/19/2009 10/19/2010 10/19/2011 10/20/2012 10/21/2008 10/21/2010 10/21/2013 10/2/2010 10/2/2012 10/2/2013 ... 9/9/2014

1 个答案:

答案 0 :(得分:1)

最好显示示例的dput。基于OP的帖子,我假设级别是空格(' ')而不是空格('')。因此,我们可以删除空格以转换为'',然后使用==

library(stringr)
sapply(dat, function(x) sum(str_trim(x)=='')==1)
#[1]  TRUE FALSE

或使用grep

sapply(lapply(dat, grepl, pattern= '^\\s+$'), all)
#[1]  TRUE FALSE

数据

dat <- list(factor(' ', levels=c(' ', 1:5)), factor(1:5, levels=1:5))