假设您有大量字符串数据框:
x <- data.frame(name = c("Alice", "Alice", "Alice", "Bob", "Bob", "Charlie"),
prod = c("Hard Hat", "Goggles", "Bus Fare", "Goggles", "Training", "Hard Hat, Laptop"))
如何向此数据框添加一个突变列(我们将其命名为category
),以基于一些任意条件对数据进行分类。例如,如果x$category
中出现单词'Hard Hat'或'Goggles',而{{1}中出现单词'Laptop',我如何将x$prod
设置为等于“ PPE” }}?
此外,如果可能的话,我希望匹配还可以处理部分匹配和不同情况。例如,“公共汽车票价”也可以作为(非详尽清单)“公共汽车票”或“公共汽车票价”或“公共汽车票”输入;无论哪种情况,我都需要将其归类为“运输”,因为会出现“公共汽车”一词。
预期输出:
x$prod
理想情况下,我想在 name prod category
1 Alice Hard Hat PPE
2 Alice Goggles PPE
3 Alice Bus Fare TRANSPORT
4 Bob Goggles PPE
5 Bob Training TRAINING
6 Charlie Laptop IT
内解决此问题,我认为它将需要tidyverse
和各种mutate()
函数的组合,但我不太清楚确切的工作流程将需要。
答案 0 :(得分:2)
根据您的情况,您可能需要为每个类别创建一个关键字向量,并通过串联的@objc func updateShimmer(_ displayLink: CADisplayLink) {
...
}
语句使用str_detect
:
|
x <- data.frame(name = c("Alice", "Alice", "Alice", "Bob", "Bob", "Charlie"),
prod = c("Hard Hat", "Goggles", "Bus Fare", "Goggles", "Training", "Hard Hat, Laptop"))
transport <- c("bus")
ppe <- c("goggles", "hard hat")
tech <- c("laptop")
training <- c("training")
x <- x %>%
mutate(
category =
case_when(
str_detect(tolower(prod), paste(transport, collapse = "|")) ~ "TRANSPORT",
str_detect(tolower(prod), paste(ppe, collapse = "|")) ~ "PPE",
str_detect(tolower(prod), paste(tech, collapse = "|")) ~ "IT",
str_detect(tolower(prod), paste(training, collapse = "|")) ~ "TRAINING",
)
)