根据r中另一列的字符串值创建新列

时间:2019-08-05 20:18:25

标签: r grep

我在r中有一个数据框,其中的列是一个大字符串。我想使用该字符串创建具有特定值的新列。

这是示例数据框:

robots.txt

现在,如果“横幅”列的字符串包含dom <- data.frame( Site = c("alpha", "beta", "charlie", "delta"), Banner = c("testing_Watermelon -DPI_300x250 v2" , "notest_Vanilla Latte-DPI_300x250 v2" , "bottle :15s","aaaa vvvv cccc Build_Mobile_320x480") ) Watermelon,则新列Vanilla的值应仅为labelWatermelon,否则为{{1 }}。以下是预期的数据框应为什么样。

如何使用Vanilla或其他任何方式在其中具有多个条件?

Default

4 个答案:

答案 0 :(得分:4)

library(stringr)
dom$label = str_extract(dom$Banner, "Watermelon|Vanilla")
dom$label[is.na(dom$label)] <- "Default"
dom
#      Site                              Banner      label
# 1   alpha  testing_Watermelon -DPI_300x250 v2 Watermelon
# 2    beta notest_Vanilla Latte-DPI_300x250 v2    Vanilla
# 3 charlie                         bottle :15s    Default
# 4   delta aaaa vvvv cccc Build_Mobile_320x480    Default

答案 1 :(得分:0)

这是使用Base R的简单解决方案:

#Sample data:
dom <- data.frame(
  Site = c("alpha", "beta", "charlie", "delta"),
  Banner = c("testing_Watermelon -DPI_300x250 v2"   , "notest_Vanilla Latte-DPI_300x250 v2" , "bottle :15s","aaaa vvvv cccc Build_Mobile_320x480")
)


dom$label <- ifelse(grepl("watermelon", dom$Banner, ignore.case = T), "Watermelon",
                    ifelse(grepl("vanilla", dom$Banner, ignore.case = T), "Vanilla", "Default"))

答案 2 :(得分:0)

一种base R可能是:

labels <- paste(c("Watermelon", "Orange"), collapse = "|")

dom$label <- sapply(regmatches(dom$Banner, regexec(labels, dom$Banner)), "[", 1)
dom$label[is.na(dom$label)] <- "Default"

     Site                              Banner      label
1   alpha  testing_Watermelon -DPI_300x250 v2 Watermelon
2    beta  notest_Orange Latte-DPI_300x250 v2     Orange
3 charlie                         bottle :15s    Default
4   delta aaaa vvvv cccc Build_Mobile_320x480    Default

dplyrtidyr也可以使用相同的方法:

dom %>%
 mutate(label = sapply(regmatches(Banner, regexec(labels, Banner)), "[", 1),
        label = replace_na(label, "Default"))

样本数据:

dom <- data.frame(
 Site = c("alpha", "beta", "charlie", "delta"),
 Banner = c("testing_Watermelon -DPI_300x250 v2"   , "notest_Orange Latte-DPI_300x250 v2" , "bottle :15s","aaaa vvvv cccc Build_Mobile_320x480")
)

答案 3 :(得分:0)

library(dplyr)
library(stringi)

dom %>% mutate(label = case_when(stri_detect_fixed(Banner, "Watermelon") ~ "Watermelon",
                                 stri_detect_fixed(Banner, "Vanilla")    ~ "Vanilla",
                                                                   TRUE  ~ "Default"))
#>      Site                              Banner          label
#> 1   alpha  testing_Watermelon -DPI_300x250 v2     Watermelon
#> 2    beta notest_Vanilla Latte-DPI_300x250 v2        Vanilla
#> 3 charlie                         bottle :15s        Default
#> 4   delta aaaa vvvv cccc Build_Mobile_320x480        Default

数据:

dom <- data.frame(Site = c("alpha", "beta", "charlie", "delta"),
                  Banner = c("testing_Watermelon -DPI_300x250 v2",
                             "notest_Vanilla Latte-DPI_300x250 v2",
                             "bottle :15s",
                             "aaaa vvvv cccc Build_Mobile_320x480"))