子选择数据帧

时间:2011-03-18 08:49:48

标签: r dataframe subset

我认为我有一个简单的问题。在我的数据框中,我想制作列Quality_score等于的子集:Perfect,Perfect * ,Perfect * ,Good,Good **和善***

这是我现在的解决方案:

>Quality_scoreComplete <- subset(completefile,Quality_score == "Perfect" | Quality_score=="Perfect***" | Quality_score=="Perfect****" | Quality_score=="Good" | Quality_score=="Good***" | Quality_score=="Good****") 

有没有办法简化这种方法?像:

methods<-c('Perfect', 'Perfect***', 'Perfect****', 'Good', 'Good***','Good***')
Quality_scoreComplete <- subset(completefile,Quality_score==methods)

谢谢大家,

Lisanne

2 个答案:

答案 0 :(得分:2)

您甚至不需要subset,请检查:?"["

Quality_scoreComplete <- completefile[completefile$Quality_score %in% methods,]

已编辑:基于@Sacha Epskamp的善意评论:表达式中的==会给出错误的结果,因此请将其更正为%in%。谢谢!

问题示例:

> x <- c(17, 19)
> cars[cars$speed==x,]
   speed dist
29    17   32
31    17   50
36    19   36
38    19   68
> cars[cars$speed %in% x,]
   speed dist
29    17   32
30    17   40
31    17   50
36    19   36
37    19   46
38    19   68

答案 1 :(得分:1)

有一件事是grepl,它在字符串中搜索模式并返回一个逻辑,指示它是否在那里。您也可以在字符串中使用|运算符来表示OR,并ignore.case忽略区分大小写:

methods<-c('Perfect', 'Perfect*', 'Perfect*', 'Good', 'Good','Good*')

completefile <- data.frame( Quality_score = c( methods, "bad", "terrible", "abbysmal"), foo = 1)

subset(completefile,grepl("good|perfect",Quality_score,ignore.case=TRUE))
1       Perfect   1
2      Perfect*   1
3      Perfect*   1
4          Good   1
5          Good   1
6         Good*   1

编辑:我现在看到案例敏感性不是问题,感谢阅读障碍!您可以简化为:

subset(completefile,grepl("Good|Perfect",Quality_score))