R - 通过检查部分文本的所有列来过滤行

时间:2016-02-21 09:31:38

标签: r

你好我是R的新手,如果它们包含一个单词,我就找不到检查行的所有列的方法,然后在任何列中只取这个单词至少1次的行。我已经创建了一个示例Dataframe来向您展示我的数据的样子。

> df
   Name currrent.Category Category.Month.1 Category.Month.2 Category.Month.3
1 Fund1      Abc Cautious     Abc Cautious     Abc Cautious     Abc Cautious
2 Fund2      Abc Cautious       Abc Global     Abc Cautious     Abc Cautious
3 Fund3        Abc Global       Abc Global       Abc Global       Abc Global
4 Fund4        Abc Global     Abc Cautious       Abc Global       Abc Global

现在我想提取所有包含单词" Cautious"的行中的行。因此,返回的数据框应包含第1,2和4行。我已将Abc添加到每个类别,因为我的数据中的类别名称较长并且在某些方面有所不同,但重要的是它们是否包含单词&#34 ;谨慎"或不。

在R中这样的操作是否可行?

> dput(df)
structure(list(Name = structure(1:4, .Label = c("Fund1", "Fund2", 
"Fund3", "Fund4"), class = "factor"), currrent.Category = structure(c(1L, 
1L, 2L, 2L), .Label = c("Abc Cautious", "Abc Global"), class = "factor"), 
Category.Month.1 = structure(c(1L, 2L, 2L, 1L), .Label = c("Abc Cautious", 
"Abc Global"), class = "factor"), Category.Month.2 = structure(c(1L, 
1L, 2L, 2L), .Label = c("Abc Cautious", "Abc Global"), class = "factor"), 
Category.Month.3 = structure(c(1L, 1L, 2L, 2L), .Label = c("Abc Cautious", 
"Abc Global"), class = "factor")), .Names = c("Name", "currrent.Category", 
"Category.Month.1", "Category.Month.2", "Category.Month.3"), class = "data.frame", row.names = c(NA, 
-4L))

我希望这是发布dput()的正确方法。

3 个答案:

答案 0 :(得分:3)

您的数据不是tidy,这就是您在处理问题时遇到问题的原因。我可以在您的数据中看到该季节的seasonstatus

gather来自tidyr包,filter magrittr 运算符(%>%)来自dplyr包。我使用正确的作业->来保持数据从左到右的流动。

library(tidyr)
library(dplyr)

df %>%
  gather(season, status, -Name) %>% 
  filter(grepl("Cautious", status)) ->
  dcautious

您可以添加例如group_by(Name) %>% summarise(ncautious=n())以获取具有数据集中的注意事项数量的资金列表。

答案 1 :(得分:3)

base R

# Extract rows that contain "Cautious" more than once
sub <- apply(df, 1, function(row) length(grep("Cautious", row)) > 0) 

# Subset df
df[sub,]
#   Name currrent.Category Category.Month.1 Category.Month.2 Category.Month.3
#1 Fund1      Abc Cautious     Abc Cautious     Abc Cautious     Abc Cautious
#2 Fund2      Abc Cautious       Abc Global     Abc Cautious     Abc Cautious
#4 Fund4        Abc Global     Abc Cautious       Abc Global       Abc Global

答案 2 :(得分:1)

使用sqldf包:

library(sqldf)
sqldf("select * from df where 
[Name] like '%Cautious%' or 
[currrent.Category] like '%Cautious%' 
or [Category.Month.1] like '%Cautious%' 
or [Category.Month.2] like '%Cautious%' 
or [Category.Month.3] like '%Cautious%'")