Question

我有三个单词和两个短语的数据框以及文本中找到的每个短语的计数。这是一些虚拟数据：

   trig <- c("took my dog", "took my cat", "took my hat", "ate my dinner", "ate my lunch")
   trig_count <- c(3, 2, 1, 3, 1)
   big <- c("took my", "took my", "took my", "ate my", "ate my")
   big_count <- c(6,6,6,4,4)
   df <- data.frame(trig, trig_count, big, big_count)
   df$trig <- as.character(df$trig)
   df$big <- as.character(df$big)

          trig    trig_count   big    big_count
   1  took my dog          3  took my         6        2  took my cat                 
   2  took my         6
   3  took my hat          1  took my         6
   4  ate my dinner        3  ate my          4
   5  ate my lunch         1  ate my          4

我想编写一个函数，将任何双字短语作为输入，如果匹配则返回df中的行，如果没有匹配则返回“不匹配”。

我尝试过各种变体：

   match_test <- function(x){
                 ifelse(x %in% df$big==T, df[df$big==x,], "no match")
                 }

它适用于不在df中的双字短语，例如：

    match_test("looked for")

返回

    "no match"

但对于有匹配的单词，它不起作用，例如：

   match_test("took my")

返回

    "took my dog" "took my cat" "took my hat"

我正在寻找的是：

           trig    trig_count   big    big_count
    1  took my dog          3  took my         6
    2  took my cat          2  took my         6
    3  took my hat          1  took my         6

我不理解的％％％是什么？或者是别的什么？非常感谢您的指导。

Answer 1

你不需要ifelse;你可以通过将你原来的df分类为@Ronak Shah建议：

df[grep(match_test, df$big), ]

如果你想将它变成一个仍然可以返回no match的函数，你可以这样做：

match_test <- function(match_string) {

  subset_df <- df[grep(match_string, df$big), ]

  if (nrow(subset_df) < 1) {
    warning("no match")
  } else {
    subset_df
  }  

}

match_test("took my")
#          trig trig_count     big big_count
# 1 took my dog          3 took my         6
# 2 took my cat          2 took my         6
# 3 took my hat          1 took my         6

如果没有什么可以匹配的话：

match_test("coffee")
# Warning message:
# In match_test("coffee") : no match

Answer 2

我们可以使用str_detect

library(stringr)
library(dplyr)
df %>% 
     filter(str_detect(big, "took my"))
#        trig trig_count     big big_count
#1 took my dog          3 took my         6
#2 took my cat          2 took my         6
#3 took my hat          1 took my         6

Answer 3

我们也可以试试这个：

library(stringr)
match_test <- function(x){
  res <- df[which(!is.na(str_match(df$big,x))),]
  if(nrow(res) == 0) return('no match')
  return(res)
}
match_test("looked for")
#[1] "no match"
match_test("took my")
#         trig trig_count     big big_count
#1 took my dog          3 took my         6
#2 took my cat          2 took my         6
#3 took my hat          1 took my         6
match_test("ate my")
#           trig trig_count    big big_count
#4 ate my dinner          3 ate my         4
#5  ate my lunch          1 ate my         4

如果在r中的变量中找到值，则if else语句用于子集化数据帧

3 个答案: