更改数据帧并执行部分字符串匹配

时间:2019-08-19 15:16:58

标签: r tidyverse

假设您有大量字符串数据框:

     x <- data.frame(name = c("Alice", "Alice", "Alice", "Bob", "Bob", "Charlie"),
                    prod = c("Hard Hat", "Goggles", "Bus Fare", "Goggles", "Training", "Hard Hat, Laptop"))


如何向此数据框添加一个突变列(我们将其命名为category),以基于一些任意条件对数据进行分类。例如,如果x$category中出现单词'Hard Hat'或'Goggles',而{{1}中出现单词'Laptop',我如何将x$prod设置为等于“ PPE” }}?

此外,如果可能的话,我希望匹配还可以处理部分匹配和不同情况。例如,“公共汽车票价”也可以作为(非详尽清单)“公共汽车票”或“公共汽车票价”或“公共汽车票”输入;无论哪种情况,我都需要将其归类为“运输”,因为会出现“公共汽车”一词。

预期输出:

x$prod

理想情况下,我想在 name prod category 1 Alice Hard Hat PPE 2 Alice Goggles PPE 3 Alice Bus Fare TRANSPORT 4 Bob Goggles PPE 5 Bob Training TRAINING 6 Charlie Laptop IT 内解决此问题,我认为它将需要tidyverse和各种mutate()函数的组合,但我不太清楚确切的工作流程将需要。

1 个答案:

答案 0 :(得分:2)

根据您的情况,您可能需要为每个类别创建一个关键字向量,并通过串联的@objc func updateShimmer(_ displayLink: CADisplayLink) { ... } 语句使用str_detect

|

结果:

x <- data.frame(name = c("Alice", "Alice", "Alice", "Bob", "Bob", "Charlie"),
                prod = c("Hard Hat", "Goggles", "Bus Fare", "Goggles", "Training", "Hard Hat, Laptop"))


transport <- c("bus")
ppe <- c("goggles", "hard hat")
tech <- c("laptop")
training <- c("training")

x <- x %>% 
  mutate(
    category = 
      case_when(
        str_detect(tolower(prod), paste(transport, collapse = "|")) ~ "TRANSPORT",
        str_detect(tolower(prod), paste(ppe, collapse = "|")) ~ "PPE",
        str_detect(tolower(prod), paste(tech, collapse = "|")) ~ "IT",
        str_detect(tolower(prod), paste(training, collapse = "|")) ~ "TRAINING",
      )
  )