我的文件就是这样-
ball cat bird ball cat cat ball
apple mouse apple apple mouse mouse apple
cat bat mouse cat bat bat cat
mouse ball bat ball ball ball ball
bat ball mouse bat bat bat bat
bird ball ball bird bird bird bird
我要提取包含单词“ apple”的列
预期输出-
ball bird ball ball
apple apple apple apple
cat mouse cat cat
mouse bat ball ball
bat mouse bat bat
bird ball bird bird
答案 0 :(得分:2)
有很多方法可以做到这一点,我也认为这必须在某处得到解决
1)使用colSums
df[colSums(df == "apple") > 0]
# V1 V3 V4 V7
#1 ball bird ball ball
#2 apple apple apple apple
#3 cat mouse cat cat
#4 mouse bat ball ball
#5 bat mouse bat bat
#6 bird ball bird bird
2)与apply
df[apply(df == "apple", 2, any)]
3)使用Filter
Filter(function(x) any(x == "apple"), df)
4)dplyr
library(dplyr)
df %>% select_if(~any(. == "apple"))
数据
df <- structure(list(V1 = structure(c(2L, 1L, 5L, 6L, 3L, 4L), .Label =
c("apple",
"ball", "bat", "bird", "cat", "mouse"), class = "factor"), V2 =
structure(c(3L,
4L, 2L, 1L, 1L, 1L), .Label = c("ball", "bat", "cat", "mouse"
), class = "factor"), V3 = structure(c(4L, 1L, 5L, 3L, 5L, 2L
), .Label = c("apple", "ball", "bat", "bird", "mouse"), class = "factor"),
V4 = structure(c(2L, 1L, 5L, 2L, 3L, 4L), .Label = c("apple",
"ball", "bat", "bird", "cat"), class = "factor"), V5 = structure(c(4L,
5L, 2L, 1L, 2L, 3L), .Label = c("ball", "bat", "bird", "cat",
"mouse"), class = "factor"), V6 = structure(c(4L, 5L, 2L,
1L, 2L, 3L), .Label = c("ball", "bat", "bird", "cat", "mouse"
), class = "factor"), V7 = structure(c(2L, 1L, 5L, 2L, 3L,
4L), .Label = c("apple", "ball", "bat", "bird", "cat"), class = "factor")),
class = "data.frame", row.names = c(NA, -6L))
答案 1 :(得分:1)
我们可以使用sapply
中的base R
df[sapply(df, function(x) 'apple' %in% x)]
df <- structure(list(V1 = structure(c(2L, 1L, 5L, 6L, 3L, 4L), .Label = c("apple",
"ball", "bat", "bird", "cat", "mouse"), class = "factor"), V2 = structure(c(3L,
4L, 2L, 1L, 1L, 1L), .Label = c("ball", "bat", "cat", "mouse"
), class = "factor"), V3 = structure(c(4L, 1L, 5L, 3L, 5L, 2L
), .Label = c("apple", "ball", "bat", "bird", "mouse"), class = "factor"),
V4 = structure(c(2L, 1L, 5L, 2L, 3L, 4L), .Label = c("apple",
"ball", "bat", "bird", "cat"), class = "factor"), V5 = structure(c(4L,
5L, 2L, 1L, 2L, 3L), .Label = c("ball", "bat", "bird", "cat",
"mouse"), class = "factor"), V6 = structure(c(4L, 5L, 2L,
1L, 2L, 3L), .Label = c("ball", "bat", "bird", "cat", "mouse"
), class = "factor"), V7 = structure(c(2L, 1L, 5L, 2L, 3L,
4L), .Label = c("apple", "ball", "bat", "bird", "cat"),
class = "factor")), class = "data.frame", row.names = c(NA,
-6L))
答案 2 :(得分:0)
如果列中可能包含大量不同的值,并且处理时间或性能成为问题,则可以首先将列转换为因子并在级别中查找匹配项,而不是遍历整个数据集。