我有一个大型数据框,其中包含商品描述(约11000行)。我想从Goods.Description
。
b <- data.frame(id = c('1','2', '3', '4'), Goods.Description = c("This green T-shirt can become...", "Stripes of unfaded denim at each side of this blue skirt make...", "Velvet's Brynna red top comes in a bohemian...", "The Riley blue jeans are Paige's take on..."), Jeans = c(0,0,0,0), T.Shirt = c(0,0,0,0), Skirt = c(0,0,0,0), Top = c(0,0,0,0), Color = c(0,0,0,0))
数据帧:
id Goods.Description Jeans T.Shirt Skirt Top Color
1 1 This green T-shirt can become... 0 0 0 0 0
2 2 Stripes of unfaded denim at each side of this blue skirt make... 0 0 0 0 0
3 3 Velvet's Brynna red top comes in a bohemian... 0 0 0 0 0
4 4 The Riley blue jeans are Paige's take on... 0 0 0 0 0
例如,
如果Goods.Description
包含单词“T-shirt”,则将{1}放入T.Shirt
,否则为0。
如果Goods.Description
包含单词“jeans”,则将{1}放入Jeans
,否则为0。
如果Goods.Description
包含单词“skirt”,则将{1}放入Skirt
,否则为0。
如果Goods.Description
包含单词“top”,则将{1}放入Top
,否则为0.
如果Goods.Description
包含“绿色”字样,则将green
放入Color
,否则为0.
如果Goods.Description
包含“blue”字样,则将blue
放入Color
,否则为0.
等等
后:
id Goods.Description Jeans T.Shirt Skirt Top Color
1 1 This green T-shirt can become... 0 1 0 0 green
2 2 Stripes of unfaded denim at each side of this blue skirt make... 0 0 1 0 blue
3 3 Velvet's Brynna red top comes in a bohemian... 0 0 0 1 red
4 4 The Riley blue jeans are Paige's take on... 1 0 0 0 blue
我不知道代码应该是什么。求你帮帮我。
答案 0 :(得分:2)
我们可以通过从列名称
中提取“颜色”和特定单词来实现library(stringr)
b$Color <- str_extract(b$Goods.Description, 'green|blue|red|blue')
v1 <- toupper(sub(".", "-", names(b)[3:6], fixed = TRUE))
b[3:6][cbind(1:nrow(b), match(v1,
str_extract(toupper(b$Goods.Description), paste(v1, collapse="|"))))] <- 1
b
# id Goods.Description Jeans T.Shirt Skirt Top Color
#1 1 This green T-shirt can become... 0 0 0 1 green
#2 2 Stripes of unfaded denim at each side of this blue skirt make... 1 0 0 0 blue
#3 3 Velvet's Brynna red top comes in a bohemian... 0 1 0 0 red
#4 4 The Riley blue jeans are Paige's take on... 0 0 1 0 blue
答案 1 :(得分:2)
library(data.table)
b <- data.frame(id = c('1','2', '3', '4'), Goods.Description = c("This green T-shirt can become...", "Stripes of unfaded denim at each side of this blue skirt make...", "Velvet's Brynna red top comes in a bohemian...", "The Riley blue jeans are Paige's take on..."), Jeans = c(0,0,0,0), T.Shirt = c(0,0,0,0), Skirt = c(0,0,0,0), Top = c(0,0,0,0), Color = c(0,0,0,0))
str(b)
setDT(b) # convert to data.table for better performance...
b[, Jeans := as.integer(grepl("jeans", Goods.Description, fixed = TRUE))]
b[, Skirt := as.integer(grepl("skirt", Goods.Description, fixed = TRUE))]
# etc. for each keyword
# Collect the colors in the "Color" target column
# initialize with empty string instead of zero (implicitly converting the col class to character)
b[, Color := NULL]
b[, Color := ""]
for (a.color in c("red", "green", "blue", "yellow"))
b[grepl(a.color, Goods.Description, fixed = TRUE), Color := paste(Color, a.color)] # paste color names to keep all colors
b
结果
id Goods.Description Jeans T.Shirt Skirt Top Color
1: 1 This green T-shirt can become... 0 0 0 0 green
2: 2 Stripes of unfaded denim at each side of this blue skirt make... 0 0 1 0 blue
3: 3 Velvet's Brynna red top comes in a bohemian... 0 0 0 0 red
4: 4 The Riley blue jeans are Paige's take on... 1 0 0 0 blue