我有两个数据框,一个有产品名称&另一个有类别..现在我需要将类别与产品名称相匹配。如果字符串匹配,则为各个名称分配相应的类别。
因此,包含产品名称(Product_Name.csv)的第一个数据框是:
**Product.Name**
Black Printed Blouse
Silver Embellished Crop Top
Maroon Solid Strappy Top
包含类别(Category.csv)的其他数据框是:
**Category**
Strappy
Blouse
Crop
最终输出应该是:
Black Printed Blouse Blouse
Silver Embellished Crop Top Crop
Maroon Solid Strappy Top Strappy
现在,我正在使用grepl,它给出了真或假
product <- read.csv("Product_Name.csv", header = T, sep = ",")
category <- read.csv("Category.csv", header = T, sep = ",")
for (i in 1:nrow(product)){
product[i, 2] <- grepl(Category$Category[1], product$Product.Name[i], ignore.case = TRUE)
product[i, 3] <- grepl(Category$Category[2], product$Product.Name[i], ignore.case = TRUE)
product[i, 4] <- grepl(Category$Category[3], product$Product.Name[i], ignore.case = TRUE)
}
答案 0 :(得分:1)
我们可以使用str_extract
library(stringr)
product$Category <- str_extract(product$Product.Name, paste(category$Category, collapse="|"))
product
# Product.Name Category
#1 Black Printed Blouse Blouse
#2 Silver Embellished Crop Top Crop
#3 Maroon Solid Strappy Top Strappy
答案 1 :(得分:0)
使用base - R
indices = sapply(category$Category, function(x) which(grepl(x, product$Product.Name)))
product$new_col = 1:nrow(product)
product$new_col[indices] = names(indices)
#> df
# X..Product.Name.. new_col
#1 Black Printed Blouse Blouse
#2 Silver Embellished Crop Top Crop
#3 Maroon Solid Strappy Top Strappy
# incase of any no-match cases(which we need to handle well)
# below code manages both well (a generalised version)
category$Category[2] = "Bloiuse"
indices = sapply(category$Category, function(x) which(grepl(x, product$Product.Name)))
indices.loc <- as.numeric(indices)
indices.name <- names(indices)
product$new_col[indices.loc[!is.na(indices.loc)]] = indices.name[!is.na(indices.loc)]
#> product
# Product.Name new_col
#1 Black Printed Blouse <NA>
#2 Silver Embellished Crop Top Crop
#3 Maroon Solid Strappy Top Strappy