提取包含特定单词的列

时间:2019-06-18 10:16:01

标签: r

我的文件就是这样-

ball    cat     bird    ball    cat     cat     ball
apple   mouse   apple   apple   mouse   mouse   apple
cat     bat     mouse   cat      bat    bat     cat
mouse   ball    bat     ball    ball    ball    ball
bat     ball    mouse   bat     bat      bat    bat
bird    ball    ball    bird    bird    bird    bird

我要提取包含单词“ apple”的列

预期输出-

ball    bird    ball    ball
apple   apple   apple   apple
cat     mouse   cat     cat
mouse   bat     ball    ball
bat     mouse   bat     bat
bird    ball    bird    bird

3 个答案:

答案 0 :(得分:2)

有很多方法可以做到这一点,我也认为这必须在某处得到解决

1)使用colSums

df[colSums(df == "apple") > 0]

#     V1    V3    V4    V7
#1  ball  bird  ball  ball
#2 apple apple apple apple
#3   cat mouse   cat   cat
#4 mouse   bat  ball  ball
#5   bat mouse   bat   bat
#6  bird  ball  bird  bird

2)与apply

df[apply(df == "apple", 2, any)]

3)使用Filter

Filter(function(x) any(x == "apple"), df)

4)dplyr

library(dplyr)
df %>% select_if(~any(. == "apple"))

数据

df <- structure(list(V1 = structure(c(2L, 1L, 5L, 6L, 3L, 4L), .Label = 
c("apple", 
"ball", "bat", "bird", "cat", "mouse"), class = "factor"), V2 = 
structure(c(3L, 
4L, 2L, 1L, 1L, 1L), .Label = c("ball", "bat", "cat", "mouse"
), class = "factor"), V3 = structure(c(4L, 1L, 5L, 3L, 5L, 2L
), .Label = c("apple", "ball", "bat", "bird", "mouse"), class = "factor"), 
V4 = structure(c(2L, 1L, 5L, 2L, 3L, 4L), .Label = c("apple", 
"ball", "bat", "bird", "cat"), class = "factor"), V5 = structure(c(4L, 
5L, 2L, 1L, 2L, 3L), .Label = c("ball", "bat", "bird", "cat", 
"mouse"), class = "factor"), V6 = structure(c(4L, 5L, 2L, 
1L, 2L, 3L), .Label = c("ball", "bat", "bird", "cat", "mouse"
), class = "factor"), V7 = structure(c(2L, 1L, 5L, 2L, 3L, 
4L), .Label = c("apple", "ball", "bat", "bird", "cat"), class = "factor")), 
class = "data.frame", row.names = c(NA, -6L))

答案 1 :(得分:1)

我们可以使用sapply中的base R

df[sapply(df, function(x)  'apple' %in% x)]

数据

df <- structure(list(V1 = structure(c(2L, 1L, 5L, 6L, 3L, 4L), .Label = c("apple", 
"ball", "bat", "bird", "cat", "mouse"), class = "factor"), V2 = structure(c(3L, 
4L, 2L, 1L, 1L, 1L), .Label = c("ball", "bat", "cat", "mouse"
), class = "factor"), V3 = structure(c(4L, 1L, 5L, 3L, 5L, 2L
), .Label = c("apple", "ball", "bat", "bird", "mouse"), class = "factor"), 
    V4 = structure(c(2L, 1L, 5L, 2L, 3L, 4L), .Label = c("apple", 
    "ball", "bat", "bird", "cat"), class = "factor"), V5 = structure(c(4L, 
    5L, 2L, 1L, 2L, 3L), .Label = c("ball", "bat", "bird", "cat", 
    "mouse"), class = "factor"), V6 = structure(c(4L, 5L, 2L, 
    1L, 2L, 3L), .Label = c("ball", "bat", "bird", "cat", "mouse"
    ), class = "factor"), V7 = structure(c(2L, 1L, 5L, 2L, 3L, 
    4L), .Label = c("apple", "ball", "bat", "bird", "cat"), 
    class = "factor")), class = "data.frame", row.names = c(NA, 
-6L))

答案 2 :(得分:0)

如果列中可能包含大量不同的值,并且处理时间或性能成为问题,则可以首先将列转换为因子并在级别中查找匹配项,而不是遍历整个数据集。