这是一个示例数据集:
df <- tibble(
size = c("l", "L/Black", "medium", "small", "large", "L/White", "s",
"L/White", "M", "S/Blue", "M/White", "L/Navy", "M/Navy", "S"),
shirt = c("blue", "black", "black", "black", "white", "white", "purple",
"white", "purple", "blue", "white", "navy", "navy", "navy")
)
上面的数据集有一个列size
,其中显示了基础知识:small
,medium
和large
。但是它也具有这些大小的其他表示形式,例如M
或S/Blue
或s
。
我想使用最有效的方法制作small
,medium
或large
的所有内容,并摆脱size
类别中的颜色。例如。将L/Black
等于large
。
我可以多次使用gsub
来执行此操作,但是我想知道是否有比我最初的想法更有效的方法。我的数据集有几千行,下面的代码示例很糟糕:
df$size <- df$size %>%
gsub("M", "medium", .) %>%
gsub("mediumedium", "medium", .) %>%
gsub("S", "small", .) %>%
gsub("smallmall", "small", .) %>%
gsub("L", "large", .) %>%
gsub("S/Blue", "small", .) %>%
gsub("L/Navy", "large", .)
此方法效果不佳,因为在上面的前两个smallmall
中运行时会引入诸如mediumedium
或gsub
之类的东西。标准化所有三种主要尺寸的最佳方法是什么?
答案 0 :(得分:1)
library("tidyverse")
df %>%
# Extract the alphanum substring at the start of "size"
extract(size, "size2", regex = "^(\\w*)", remove = FALSE) %>%
# All lowercase in case there are sizes like "Small"
# And then recode as required.
# Here "l" = "large" means take all occurrences of "l" and
# recode them as "large", etc.
mutate(size3 = recode(tolower(size2),
"l" = "large",
"m" = "medium",
"s" = "small"))
# # A tibble: 14 x 4
# size size2 shirt size3
# <chr> <chr> <chr> <chr>
# 1 l l blue large
# 2 L/Black L black large
# 3 medium medium black medium
# 4 small small black small
# 5 large large white large
当然,您不需要三个大小列。我使用了不同的列名,这样很明显每个转换都可以实现。
答案 1 :(得分:1)
使用tidyverse
的解决方案。
library(tidyverse)
df2 <- df %>%
# Remove color
mutate(size = map2_chr(size, shirt, ~str_replace(.x, fixed(.y, ignore_case = TRUE), ""))) %>%
# Remove /
mutate(size = str_replace(size, fixed("/"), "")) %>%
# Replacement
mutate(size = case_when(
size %in% "l" | size %in% "L" ~ "large",
size %in% "m" | size %in% "M" ~ "medium",
size %in% "s" | size %in% "S" ~ "small",
TRUE ~ size
))
df2
# # A tibble: 14 x 2
# size shirt
# <chr> <chr>
# 1 large blue
# 2 large black
# 3 medium black
# 4 large black
# 5 large white
# 6 large white
# 7 small purple
# 8 large white
# 9 medium purple
# 10 small blue
# 11 medium white
# 12 large navy
# 13 medium navy
# 14 small navy