我有两个桌子
tab1=structure(list(generated_id = c(482160724447511, 482160724447511
), utc_time = structure(c(1L, 1L), .Label = "30.09.2018 12:46", class = "factor"),
local_time = structure(c(1L, 1L), .Label = "30.09.2018 15:46", class = "factor"),
user_locale = structure(c(1L, 1L), .Label = "en", class = "factor"),
network = structure(c(1L, 1L), .Label = "Facebook Installs", class = "factor"),
campaign = structure(c(1L, 1L), .Label = "(GR23)(BGM)(AND)(FB)(App Events)(US)(W35+)(27.09.2018) (23843105742120752)", class = "factor"),
adgroup = structure(c(1L, 1L), .Label = "(GR23)(BGM)(AND)(FB)(META)(US)(W35+)(NONE)(APP_EV)(NONE)(PURCHASE)(NONE)(27.09.2018) (23843105743590752)", class = "factor")), class = "data.frame", row.names = c(NA,
-2L))
tab2=
structure(list(date = structure(c(1L, 1L), .Label = "10.10.2018", class = "factor"),
campaign_id = c(2.38431e+16, 2.38431e+16), ad_set_id = c(2.38431e+16,
2.38431e+16), spent = c(1.77, 13.85)), class = "data.frame", row.names = c(NA,
-2L))
tab2$campaign_id=tab1$campaign
tab2$ad_set_id=tab1$adgroup
通常我使用sinple函数合并
merge(tab1,tab2 , by =c("campaign", "adgroup"
))
但是在这种情况下,我遇到了困难,因为tab1$campaign
的ID在方括号的结尾
(GR23)(BGM)(AND)(FB).... (***23843105743590752***)
(GR23)(BGM)(AND)(FB)(META)(US)(W35+)(NONE)(APP_EV)(NONE)(PURCHASE)(NONE)(27.09.2018) (***23843105743590752***)
其中( * * )是要合并的ID
在这种情况下,如果方括号中的tab1键ID位于tab1和id之间,我该如何按广告系列和广告组合并tab1和tab2?
答案 0 :(得分:1)
如果我正确理解了您的问题,那么现在的问题是将表合并到列的子字符串上。
实现此目的的一种方法是提取该子字符串并将其添加到tab1
。
由于tab1
中的行是相同的,并且tab2
中的id与tab1
中的任何一个都不匹配,因此我使用了不同的集合:
tab1 <- structure(list(campaign = c("(GR23)(BGM)(AND)(FB)(App Events)(US)(W35+)(27.09.2018) (23843105742120752)",
"(GR23)(BGM)(AND)(FB)(App Events)(US)(W35+)(27.09.2018) (23843105742120753)"),
adgroup = c("(GR23)(BGM)(AND)(FB)(META)(US)(W35+)(NONE)(APP_EV)(NONE)(PURCHASE)(NONE)(27.09.2018) (23843105743590752)",
"(GR23)(BGM)(AND)(FB)(META)(US)(W35+)(NONE)(APP_EV)(NONE)(PURCHASE)(NONE)(27.09.2018) (23843105743590752)"),
generated_id = c(482160724447511, 482160724447511)),
row.names = c(NA, -2L), class = "data.frame")
tab2 <- structure(list(campaign_id = c("23843105742120752", "23843105742120753"),
ad_set_id = c("23843105743590752", "23843105743590752"),
date = c("10.10.2018", "10.10.2018"), spent = c(1.77, 13.85)),
row.names = c(NA, -2L), class = "data.frame")
# Create a function that extracts the id from the last part
extract_id <- function(x){
s <- strsplit(as.character(x), " ")
s_id <- sapply(s, function(si) si[length(si)])
ids <- gsub("[^[:digit:] ]", "", s_id) # Remove all but digits/numbers
return(ids)
}
# Add the extracted id's to tab1
tab1$campaign_id <- extract_id(tab1$campaign)
tab1$adgroup_id <- extract_id(tab1$adgroup)
# Your result
result <- merge(tab1, tab2,
by.x = c("campaign_id", "adgroup_id"),
by.y = c("campaign_id", "ad_set_id"))
请注意,除了不同的值外,某些列还具有不同的类型。即character
,而不是factor
。