我有一个纯种名称数据库,其结构如下:
HorseName <- c("Grey emperor", "Smokey grey", "Gaining greys", "chestnut", "Glowing Chestnuts", "Ruby red", "My fair lady", "Man of war")
Number <- seq(1:8)
df <- data.frame(HorseName, Number)
我现在希望在每匹马的名字中搜索颜色的出现。具体来说,我希望选择&#39; grey&#39;的所有实例。和&#39;栗子&#39;,创建一个标识这些颜色的新列。任何其他名称可以简单地“其他”#39;不幸的是,名称不一致,包括复数和不同的案例格式。我将如何在R中进行此操作?
我的预期输出是:
df$Type <- c("Grey", "Grey", "Grey", "Chestnut", "Chestnut", "Other", "Other", "Other")
我熟悉链式ifelse语句,但不确定如何处理复数出现和案例敏感性!
答案 0 :(得分:3)
如果您对其他方法感兴趣,这里有一个tidyverse
替代方案,其结果与@ amrrs一样。
library(tidyverse)
library(stringr)
df %>%
mutate(Type = str_extract(str_to_lower(HorseName), "grey|chestnut")) %>%
mutate(Type = str_to_title(if_else(is.na(Type), "other", Type)))
#> HorseName Number Type
#> 1 Grey emperor 1 Grey
#> 2 Smokey grey 2 Grey
#> 3 Gaining greys 3 Grey
#> 4 chestnut 4 Chestnut
#> 5 Glowing Chestnuts 5 Chestnut
#> 6 Ruby red 6 Other
#> 7 My fair lady 7 Other
#> 8 Man of war 8 Other
答案 1 :(得分:2)
在使用grepl进行模式匹配之前,将所有输入文本df $ HorseName转换为小写(使用较小的模式)解决了这个问题。
> df$Type <- ifelse(grepl('grey',tolower(df$HorseName)),'Grey',
+ ifelse(grepl('chestnut',tolower(df$HorseName)),'Chestnut',
+ 'others'))
> df
HorseName Number Type
1 Grey emperor 1 Grey
2 Smokey grey 2 Grey
3 Gaining greys 3 Grey
4 chestnut 4 Chestnut
5 Glowing Chestnuts 5 Chestnut
6 Ruby red 6 others
7 My fair lady 7 others
8 Man of war 8 others
>