我创建了一个匹配不同字符串值的代码,如果匹配则将字符串值替换为后面的
我有一个数据框,其他是第一个
的数组df1 <- data.frame(campaign_source=c("googleadwords", "google display" ,"twitter banner", "facebook-post", "facebook like","inmobi","organic"),cost=c(4,2,3,4,5,6,7))
source<-c("google","facebook","twitter")
目标是在df1中创建一个新列,该列应基于与源df1 $ campaign_source匹配的任何文本的值,因此我使用
df1$n_campaign_source<-"other"
for (k in 1:nrow(df1))
{
for(i in 1:length(source)){
h<-df1[k,]$campaign_source
h1<-df1[k,]$n_campaign_source
j <- grep(source[i],h )
if(is.na(j[1]) == FALSE & h1 !='other'){
df1[k,]$n_campaign_source<-source[i]
}
}}
这个需要花费很多时间,任何更快的解决方案都会受到赞赏 最终输出
no campaign_source cost n_campaign_source
1 googleadwords 4 google
2 google display 2 google
3 twitter banner 3 facebook
4 facebook-post 4 facebook
5 facebook like 5 twitter
6 inmobi 6 other
7 organic 7 other
答案 0 :(得分:1)
(上面的答案显示不正确。)尝试使用grep
结果作为分配索引的替代代码:
df1$source <- NA
for( item in source ) df1$source[grep(item, df1$campaign_source)] <- item
df1$source[is.na(df1$source)] <- "other"
df1
#-----------------
campaign_source cost source
1 google adwords 4 google
2 google display 2 google
3 twitter banner 3 twitter
4 facebook post 4 facebook
5 facebook like 5 facebook
6 inmobi 6 other
7 organic 7 other
答案 1 :(得分:0)
以下是使用strsplit
的替代解决方案:
df1$source <- sapply(df1$campaign_source, function(x) {
w <- unlist(strsplit(as.character(x), " "));
if (length(w[w %in% source]) > 0) w[w %in% source] else "other";
})
#campaign_source cost source
#1 google adwords 4 google
#2 google display 2 google
#3 twitter banner 3 twitter
#4 facebook post 4 facebook
#5 facebook like 5 facebook
#6 inmobi 6 other
#7 organic 7 other
df1 <- data.frame(campaign_source=c("google adwords", "google display" ,"twitter banner", "facebook post", "facebook like","inmobi","organic"),cost=c(4,2,3,4,5,6,7))
source<-c("google", "facebook", "twitter");