如何搜索包含在字符串列表中的字符串的一部分,并返回R中匹配的字符串

时间:2016-01-15 03:47:06

标签: r

以下数据框包含“广告系列”列,列的值包含有关季节,名称和位置的信息,但是,这些信息的顺序在每行中都是安静的。幸运的是,这些信息是一个固定的列表,因此我们可以创建一个向量来匹配“Campaign_name”列中的字符串。

   Date           Campaign
1 Jan-15   Summer|Peter|Up
2 Feb-15 David|Winter|Down
3 Mar-15   Up|Peter|Spring

这是我想要做的,我想创建3列作为名称,季节,位置。因此,这些列可以搜索广告系列列中的字符串,并从下面的列表中返回匹配的值。

Name <- c("Peter, David")
Season <- c("Summer","Spring","Autumn", "Winter")
Position <- c("Up","Down")

所以我想要的结果将是

Temp
    Date          Campaign  Name Season Position
1 15-Jan   Summer|Peter|Up Peter Summer       Up
2 15-Feb David|Winter|Down David Winter     Down
3 15-Mar   Up|Peter|Spring Peter Spring       Up

3 个答案:

答案 0 :(得分:3)

另一种方式:

L <- strsplit(df$Campaign,split = '\\|')

df$Name <- sapply(L,intersect,Name)
df$Season <- sapply(L,intersect,Season)
df$Position <- sapply(L,intersect,Position)

答案 1 :(得分:2)

执行以下操作:

Date = c("Jan-15","Feb-15","Mar-15")
Campaign = c("Summer|Peter|Up","David|Winter|Down","Up|Peter|Spring")
df = data.frame(Date,Campaign)

Name <- c("Peter", "David")
Season <- c("Summer","Spring","Autumn", "Winter")
Position <- c("Up","Down")

for(k in Name){
    df$Name[grepl(pattern = k, x = df$Campaign)] <- k
}

for(k in Season){
    df$Season[grepl(pattern = k, x = df$Campaign)] <- k
}

for(k in Position){
    df$Position[grepl(pattern = k, x = df$Campaign)] <- k
}

这给出了:

> df
    Date          Campaign  Name Season Position
1 Jan-15   Summer|Peter|Up Peter Summer       Up
2 Feb-15 David|Winter|Down David Winter     Down
3 Mar-15   Up|Peter|Spring Peter Spring       Up

答案 2 :(得分:2)

我和Marat Talipov有同样的想法;这是一个data.table选项:

library(data.table)

Name     <- c("Peter", "David")
Season   <- c("Summer","Spring","Autumn", "Winter")
Position <- c("Up","Down")

dat <- data.table(Date=c("Jan-15", "Feb-15", "Mar-15"),
                  Campaign=c("Summer|Peter|Up", "David|Winter|Down", "Up|Peter|Spring"))

给出

> dat
 Date          Campaign
1: Jan-15   Summer|Peter|Up
2: Feb-15 David|Winter|Down
3: Mar-15   Up|Peter|Spring

然后处理

dat[ , `:=`(Name     = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Name),
            Season   = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Season),
            Position = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Position))
    ]

结果:

> dat
     Date          Campaign  Name Season Position
1: Jan-15   Summer|Peter|Up Peter Summer       Up
2: Feb-15 David|Winter|Down David Winter     Down
3: Mar-15   Up|Peter|Spring Peter Spring       Up

如果您在很多专栏中执行此操作或需要进行适当修改(通过引用),也许会有一些好处。

我很感兴趣,如果有人能告诉我如何一次更新所有三列。

编辑:没关系,想通了;

for (icol in c("Name", "Season", "Position")) 
    dat[, (icol):=sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, get(icol))]