考虑我有df
:
Product Category
Bill Payment for Torrent Power Limited
Recharge of Videocon d2h DTH
Bill Payment of Airtel Mobile
Recharge of Idea Mobile
现在,如果一个字符串包含" Bill Payment"和"移动"然后,我想将其类别标记为" Postpaid"如果一个字符串包含" Recharge"和"移动"我想将其标记为"预付"。
我是R的初学者,所以最容易理解。
结果应为
Product Category
Bill Payment for Torrent Power Limited NA
Recharge of Videocon d2h DTH NA
Bill Payment of Airtel Mobile Postpaid
Recharge of Idea Mobile Prepaid
答案 0 :(得分:4)
我们可以使用grep
找到'产品'的索引,同时使用'付款/移动'('i1')或'充值/移动'('i2')。在将“类别”初始化为NA之后,我们根据索引i1和i2替换元素。
i1 <- grepl('Bill Payment', df1$Product) & grepl('Mobile', df1$Product)
i2 <- grepl('Recharge', df1$Product) & grepl('Mobile', df1$Product)
df1$Category <- NA
df1$Category[i1] <- 'Postpaid'
df1$Category[i2] <- 'Prepaid'
df1
#[1] NA NA "Postpaid" "Prepaid"
或稍微更紧凑(适用于示例)选项是
i1 <- grepl('.*Bill Payment.*Mobile.*', df1$Product)
i2 <- grepl('.*Recharge.*Mobile.*', df1$Product)
并使用ifelse
答案 1 :(得分:3)
另一种方法是首先创建数字索引,然后添加相应的值:
indx <- (grepl('Bill Payment', df1$Product) & grepl('Mobile', df1$Product)) +
(grepl('Recharge', df1$Product) & grepl('Mobile', df1$Product))*2 + 1L
df1$category <- c(NA, "Postpaid", "Prepaid")[indx]
给出:
> df1
Product category
1 Bill Payment for Torrent Power Limited <NA>
2 Recharge of Videocon d2h DTH <NA>
3 Bill Payment of Airtel Mobile Postpaid
4 Recharge of Idea Mobile Prepaid
您也可以使用@akrun提出的更紧凑的表示法创建此索引:
indx <- grepl('.*Bill Payment.*Mobile.*', df1$Product) +
grepl('.*Recharge.*Mobile.*', df1$Product)*2 + 1L
或者像@nicola提议的那样:
tmp <- grepl('Mobile', df1$Product)
indx <- (grepl('Bill Payment', df1$Product) & tmp) + (grepl('Recharge', df1$Product) & tmp)*2 + 1L