以下是数据外观的示例。我需要用NA替换所有这些空格,以便as.Date(dat[,i])
不会产生错误
> dat[,i]
[1]
[28]
[55]
[82] 6/26/2007 7/5/2007 7/5/2007 12/6/2007 2/5/2008
[109] 3/27/2008 6/29/2008 9/16/2008 11/3/2008 9/11/2008 11/24/2008 12/29/2008 11/20/2008 1/26/2009 1/8/2009 3/5/2009
[136] 4/7/2009 6/9/2009 8/23/2009 8/16/2009 9/2/2009 10/6/2009 10/14/2009 10/24/2009 10/22/2009 11/5/2009 12/9/2009 2/4/2010 3/18/2010
[163] 7/8/2010 7/7/2010 7/29/2010 10/6/2010 10/7/2010 11/18/2010 1/12/2011 1/6/2011 4/5/2011 4/21/2011 5/25/2011 6/20/2011
[190] 12/12/2011 2/29/2012 2/22/2012 3/7/2012 3/28/2012 5/16/2012 5/23/2012 6/14/2012 8/14/2012 8/16/2012 9/5/2012 9/30/2012 11/5/2012 12/25/2012 12/27/2012 3/14/2013
[217] 7/24/2013 7/31/2013 9/2/2013 10/16/2013 10/30/2013 12/13/2013 2/24/2014 3/9/2014 6/29/2014 6/23/2014
[244] 9/1/2014 9/22/2014 9/22/2014 11/23/2014 2/24/2015 3/17/2015 4/8/2015 6/23/2015 6/23/2015 7/4/2015
[271] ...
[3538] 6/29/2012 11/16/2012 11/23/2012 9/1/2012
916 Levels: 10/10/2008 10/10/2009 10/10/2012 10/11/2010 10/11/2013 10/1/2010 10/12/2009 10/14/2009 10/14/2010 10/14/2011 10/14/2014 10/15/2009 10/15/2014 10/16/2013 10/17/2011 10/19/2009 10/19/2010 10/19/2011 10/20/2012 10/21/2008 10/21/2010 10/21/2013 10/2/2010 10/2/2012 10/2/2013 ... 9/9/2014
但是其中的每个单元都具有相同的数据类型 - “因素”。对于dat[,i][1] == ""
和dat[,i][1]
,dat[,i][3511]
都会返回false,那么我应该如何区分它们以便我可以恰当地使用apply
将NA放在需要去的地方?
> dat[,i][1]
[1]
916 Levels: 10/10/2008 10/10/2009 10/10/2012 10/11/2010 10/11/2013 10/1/2010 10/12/2009 10/14/2009 10/14/2010 10/14/2011 10/14/2014 10/15/2009 10/15/2014 10/16/2013 10/17/2011 10/19/2009 10/19/2010 10/19/2011 10/20/2012 10/21/2008 10/21/2010 10/21/2013 10/2/2010 10/2/2012 10/2/2013 ... 9/9/2014
> class(dat[,i][1])
[1] "factor"
> dat[,i][3511]
[1] 2/20/2012
916 Levels: 10/10/2008 10/10/2009 10/10/2012 10/11/2010 10/11/2013 10/1/2010 10/12/2009 10/14/2009 10/14/2010 10/14/2011 10/14/2014 10/15/2009 10/15/2014 10/16/2013 10/17/2011 10/19/2009 10/19/2010 10/19/2011 10/20/2012 10/21/2008 10/21/2010 10/21/2013 10/2/2010 10/2/2012 10/2/2013 ... 9/9/2014
> class(dat[,i][3511])
[1] "factor"
此外,试图“降低水平”没有任何作用,仍然只是一个因素:
> dat[,i][[1]]
[1]
916 Levels: 10/10/2008 10/10/2009 10/10/2012 10/11/2010 10/11/2013 10/1/2010 10/12/2009 10/14/2009 10/14/2010 10/14/2011 10/14/2014 10/15/2009 10/15/2014 10/16/2013 10/17/2011 10/19/2009 10/19/2010 10/19/2011 10/20/2012 10/21/2008 10/21/2010 10/21/2013 10/2/2010 10/2/2012 10/2/2013 ... 9/9/2014
> dat[,i][1][1]
[1]
916 Levels: 10/10/2008 10/10/2009 10/10/2012 10/11/2010 10/11/2013 10/1/2010 10/12/2009 10/14/2009 10/14/2010 10/14/2011 10/14/2014 10/15/2009 10/15/2014 10/16/2013 10/17/2011 10/19/2009 10/19/2010 10/19/2011 10/20/2012 10/21/2008 10/21/2010 10/21/2013 10/2/2010 10/2/2012 10/2/2013 ... 9/9/2014
答案 0 :(得分:1)
最好显示示例的dput
。基于OP的帖子,我假设级别是空格(' '
)而不是空格(''
)。因此,我们可以删除空格以转换为''
,然后使用==
library(stringr)
sapply(dat, function(x) sum(str_trim(x)=='')==1)
#[1] TRUE FALSE
或使用grep
sapply(lapply(dat, grepl, pattern= '^\\s+$'), all)
#[1] TRUE FALSE
dat <- list(factor(' ', levels=c(' ', 1:5)), factor(1:5, levels=1:5))