如果文本字符串包含某些内容,则返回R中的内容

时间:2016-01-28 11:22:29

标签: r

考虑我有df

Product                                   Category   
Bill Payment for Torrent Power Limited    
Recharge of Videocon d2h DTH              
Bill Payment of Airtel Mobile
Recharge of Idea Mobile

现在,如果一个字符串包含" Bill Payment"和"移动"然后,我想将其类别标记为" Postpaid"如果一个字符串包含" Recharge"和"移动"我想将其标记为"预付"。

我是R的初学者,所以最容易理解。

结果应为

Product                                   Category   
Bill Payment for Torrent Power Limited    NA
Recharge of Videocon d2h DTH              NA
Bill Payment of Airtel Mobile             Postpaid
Recharge of Idea Mobile                   Prepaid

2 个答案:

答案 0 :(得分:4)

我们可以使用grep找到'产品'的索引,同时使用'付款/移动'('i1')或'充值/移动'('i2')。在将“类别”初始化为NA之后,我们根据索引i1和i2替换元素。

i1 <- grepl('Bill Payment', df1$Product) & grepl('Mobile', df1$Product)
i2 <- grepl('Recharge', df1$Product) & grepl('Mobile', df1$Product)
df1$Category <- NA
df1$Category[i1] <- 'Postpaid'
df1$Category[i2] <- 'Prepaid'
df1
#[1] NA         NA         "Postpaid" "Prepaid" 

或稍微更紧凑(适用于示例)选项是

i1 <- grepl('.*Bill Payment.*Mobile.*', df1$Product)
i2 <- grepl('.*Recharge.*Mobile.*', df1$Product)

并使用ifelse

答案 1 :(得分:3)

另一种方法是首先创建数字索引,然后添加相应的值:

indx <- (grepl('Bill Payment', df1$Product) & grepl('Mobile', df1$Product)) + 
  (grepl('Recharge', df1$Product) & grepl('Mobile', df1$Product))*2 + 1L

df1$category <- c(NA, "Postpaid", "Prepaid")[indx]

给出:

> df1
                                 Product category
1 Bill Payment for Torrent Power Limited     <NA>
2           Recharge of Videocon d2h DTH     <NA>
3          Bill Payment of Airtel Mobile Postpaid
4                Recharge of Idea Mobile  Prepaid

您也可以使用@akrun提出的更紧凑的表示法创建此索引:

indx <- grepl('.*Bill Payment.*Mobile.*', df1$Product) + 
  grepl('.*Recharge.*Mobile.*', df1$Product)*2 + 1L

或者像@nicola提议的那样:

tmp <- grepl('Mobile', df1$Product)
indx <- (grepl('Bill Payment', df1$Product) & tmp) + (grepl('Recharge', df1$Product) & tmp)*2 + 1L