我有一个大型数据集,其中每行可能有一个包含文本的单元格,而其余的都是空的。有没有办法在给定列中仅包含文本的那些行,而其余列是空的?
我知道我可以使用例如tmp[tmp$A!="" & tmp$B=="" & tmp$C=="" & tmp$D=="",]
,但正如我所说的那样。 30列,我想为每列运行此列,这将是相当繁琐的。我尝试了以下方法,但它没有按预期运行。
tmp=data.frame(A=c("a","","","",""),
B=c("","b","","",""),
C=c("","","c","",""),
D=c("","","","D",""))
#Attempting subsetting across multiple columns with tmp[,2:3]
tmp[tmp[,1]!="" & tmp[,2:3]=="",]
A B C D
1 a
NA <NA> <NA> <NA> <NA>
#But it results in creating rows with na
tmp[tmp[,1]!="" & tmp[,2:4]=="",]
A B C D
1 a
NA <NA> <NA> <NA> <NA>
NA.1 <NA> <NA> <NA> <NA>
我只想结束:
A B C D
1 a
然后在ifelse()
中使用哪个,这样如果A列中只有单元格带有文本,则给E列文本A,如果B列中的文本只给E列文本B
A B C D E
a A
b B
c C
d D
建议?
答案 0 :(得分:2)
在您的示例数据框中,字符向量会转换为因子,因此您可以在示例数据框中使用stringsAsFactors=FALSE
删除R的默认行为
tmp=data.frame(A=c("a","","","",""),
B=c("","b","","",""),
C=c("","","c","",""),
D=c("","","","D",""),stringsAsFactors=FALSE)
然后你可以得到你所期望的:
kk<-tmp[tmp[,1]!="",]
> kk
A B C D
1 a
ll<-tmp[tmp[,2]!="",]
> ll
A B C D
2 b
tmp[1:4,"E"]<-names(is.na(c(tmp)))
tmp
A B C D E
1 a A
2 b B
3 c C
4 D D
5 <NA>
na.omit(tmp)
A B C D E
1 a A
2 b B
3 c C
4 D D
............................................... .......................
原意见:
使用str(tmp)
str(tmp)
'data.frame': 5 obs. of 4 variables:
$ A: Factor w/ 2 levels "","a": 2 1 1 1 1
$ B: Factor w/ 2 levels "","b": 1 2 1 1 1
$ C: Factor w/ 2 levels "","c": 1 1 2 1 1
$ D: Factor w/ 2 levels "","D": 1 1 1 2 1
所以,
levels(tmp[,1])
[1] "" "a"
因此,您需要使用levels(tmp[,1]))==""
............................................... .................