替换包含某个字符串模式的变量值

时间:2019-02-01 14:05:19

标签: r dplyr

我有这种数据:

library(dplyr)
glimpse(full_dat)
Observations: 9,720
Variables: 6
$ Product <chr> "Apple iPhone 4s 8GB Unlocked GSM Smartphone w/ S...
$ Brand   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
$ Price   <dbl> 115, 115, 115, 115, 115, 115, 115, 115, 115, 115,...
$ Rating  <dbl> 5, 1, 4, 5, 5, 3, 5, 5, 5, 1, 5, 5, 1, 5, 2, 5, 5...
$ Reviews <chr> "It was new and at a great price! Phone came real...
$ Votes   <dbl> 2, 1, 0, 1, 2, 2, 2, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0...

我想更改有关字符串的变量Product的值。例如,如果变量包含模式“ iphone 4s”,我只想将值更改为“ iphone 4s”。

伪代码:

glimpse(full_dat)
Observations: 9,720
Variables: 6
$ Product <chr> "iPhone 4s", "iPhone 4s", "iPhone 4s", "iphone 4s...
$ Brand   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
$ Price   <dbl> 115, 115, 115, 115, 115, 115, 115, 115, 115, 115,...
$ Rating  <dbl> 5, 1, 4, 5, 5, 3, 5, 5, 5, 1, 5, 5, 1, 5, 2, 5, 5...
$ Reviews <chr> "It was new and at a great price! Phone came real...
$ Votes   <dbl> 2, 1, 0, 1, 2, 2, 2, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0...

我读了一篇类似的文章,其中提出了以下解决方案。

full_dat %>% 
  mutate_at(vars(contains('iphone 4s')), funs(.=='ipohne 4s'))

然而,这并不在我的情况下工作,即剩余的相同的值。

这是一个小样本:

product = c(full_dat$Product[1:5])
dput(product)

c("Apple iPhone 4s 8GB Unlocked GSM Smartphone w/ Siri, iCloud and 8MP Camera - Black", 
"Apple iPhone 4s 8GB Unlocked GSM Smartphone w/ Siri, iCloud and 8MP Camera - Black", 
"Apple iPhone 4s 8GB Unlocked GSM Smartphone w/ Siri, iCloud and 8MP Camera - Black", 
"Apple iPhone 4s 8GB Unlocked GSM Smartphone w/ Siri, iCloud and 8MP Camera - Black", 
"Apple iPhone 4s 8GB Unlocked GSM Smartphone w/ Siri, iCloud and 8MP Camera - Black"
)

1 个答案:

答案 0 :(得分:1)

我认为您正在寻找

library(dplyr)

samp %>%
   mutate_at(vars(Product), funs(replace(., grepl('iPhone 4s', .), 'iphone 4s')))

这会将replace中任何包含“ iPhone 4s”的Product更改为仅“ iphone 4s”。

当然,您也可以在没有dplyr的情况下执行此操作

df$Product <- with(samp, replace(Product, grepl('iPhone 4s', Product),'iPhone 4s'))