IF ...用于文本分析的ELSE语句

时间:2017-03-19 08:28:15

标签: r if-statement dataframe

我有一个大型数据框,其中包含商品描述(约11000行)。我想从Goods.Description

中提取新变量(产品类型和产品颜色)
b <- data.frame(id = c('1','2', '3', '4'), Goods.Description = c("This green T-shirt can become...", "Stripes of unfaded denim at each side of this blue skirt make...", "Velvet's Brynna red top comes in a bohemian...", "The Riley blue jeans are Paige's take on..."), Jeans = c(0,0,0,0), T.Shirt = c(0,0,0,0), Skirt = c(0,0,0,0), Top = c(0,0,0,0), Color = c(0,0,0,0))

数据帧:

  id                                                Goods.Description Jeans T.Shirt Skirt Top Color
1  1                                 This green T-shirt can become...     0       0     0   0     0
2  2 Stripes of unfaded denim at each side of this blue skirt make...     0       0     0   0     0
3  3                   Velvet's Brynna red top comes in a bohemian...     0       0     0   0     0
4  4                      The Riley blue jeans are Paige's take on...     0       0     0   0     0

例如, 如果Goods.Description包含单词“T-shirt”,则将{1}放入T.Shirt,否则为0。

如果Goods.Description包含单词“jeans”,则将{1}放入Jeans,否则为0。

如果Goods.Description包含单词“skirt”,则将{1}放入Skirt,否则为0。

如果Goods.Description包含单词“top”,则将{1}放入Top,否则为0.

如果Goods.Description包含“绿色”字样,则将green放入Color,否则为0.

如果Goods.Description包含“blue”字样,则将blue放入Color,否则为0.

等等

后:

  id                                                Goods.Description Jeans T.Shirt Skirt Top Color
1  1                                 This green T-shirt can become...     0       1     0   0 green
2  2 Stripes of unfaded denim at each side of this blue skirt make...     0       0     1   0  blue
3  3                   Velvet's Brynna red top comes in a bohemian...     0       0     0   1   red
4  4                      The Riley blue jeans are Paige's take on...     1       0     0   0  blue

我不知道代码应该是什么。求你帮帮我。

2 个答案:

答案 0 :(得分:2)

我们可以通过从列名称

中提取“颜色”和特定单词来实现
library(stringr)
b$Color <- str_extract(b$Goods.Description, 'green|blue|red|blue')
v1 <- toupper(sub(".", "-", names(b)[3:6], fixed = TRUE))
b[3:6][cbind(1:nrow(b), match(v1, 
      str_extract(toupper(b$Goods.Description), paste(v1, collapse="|"))))] <- 1

b
#   id                                                Goods.Description Jeans T.Shirt Skirt Top Color
#1  1                                 This green T-shirt can become...     0       0     0   1 green
#2  2 Stripes of unfaded denim at each side of this blue skirt make...     1       0     0   0  blue
#3  3                   Velvet's Brynna red top comes in a bohemian...     0       1     0   0   red
#4  4                      The Riley blue jeans are Paige's take on...     0       0     1   0  blue

答案 1 :(得分:2)

library(data.table)

b <- data.frame(id = c('1','2', '3', '4'), Goods.Description = c("This green T-shirt can become...", "Stripes of unfaded denim at each side of this blue skirt make...", "Velvet's Brynna red top comes in a bohemian...", "The Riley blue jeans are Paige's take on..."), Jeans = c(0,0,0,0), T.Shirt = c(0,0,0,0), Skirt = c(0,0,0,0), Top = c(0,0,0,0), Color = c(0,0,0,0))
str(b)

setDT(b) # convert to data.table for better performance...

b[, Jeans := as.integer(grepl("jeans", Goods.Description, fixed = TRUE))]
b[, Skirt := as.integer(grepl("skirt", Goods.Description, fixed = TRUE))]
# etc. for each keyword

# Collect the colors in the "Color" target column

# initialize with empty string instead of zero (implicitly converting the col class to character)
b[, Color := NULL]
b[, Color := ""]
for (a.color in c("red", "green", "blue", "yellow"))
  b[grepl(a.color, Goods.Description, fixed = TRUE), Color := paste(Color, a.color)] # paste color names to keep all colors

b

结果

   id                                                Goods.Description Jeans T.Shirt Skirt Top  Color
1:  1                                 This green T-shirt can become...     0       0     0   0  green
2:  2 Stripes of unfaded denim at each side of this blue skirt make...     0       0     1   0   blue
3:  3                   Velvet's Brynna red top comes in a bohemian...     0       0     0   0    red
4:  4                      The Riley blue jeans are Paige's take on...     1       0     0   0   blue