我在用户广告系列中有如下数据框,但有以下详细信息:
列是: email_address,response_date,campaign_name,州,郊区,邮政编码,magazine_subs_title,response_type
email_address response_date campaign_name state suburb postcode magazine_subs_title response_type
jow.wow@gmail.com 1/02/2018 18:01 2018_Beauty_Acq NSW Sydney 2000 AGL opened
dew.jones@yahoo.id 03/10/2017 14:00:00 2017_Fashion_Show QLD Brisbane 4000 MHI delivered
dew.jones@yahoo.id 03/10/2017 17:00:00 2017_Fashion_Show QLD Brisbane 4000 MHI opened
jow.wow@gmail.com 25/01/2018 9:00 2018_Beauty_Acq NSW Sydney 2000 AGL delivered
jow.wow@gmail.com 14/07/2017 11:00 2017_Fashion_Show NSW Sydney 2000 AGL delivered
从这里开始,我想提取response_date,其中response_type ='已发送'并针对每个广告系列具体说明,并以下表结束:
email_address response_date campaign_name state suburb postcode magazine_subs_title response_type delivered_date
jow.wow@gmail.com 1/02/2018 18:01 2018_Beauty_Acq NSW Sydney 2000 AGL opened 25/01/2018 9:00
dew.jones@yahoo.id 03/10/2017 14:00:00 PM 2017_Fashion_Show QLD Brisbane 4000 MHI delivered 03/10/2017 14:00:00 PM
dew.jones@yahoo.id 03/10/2017 17:00:00 PM 2017_Fashion_Show QLD Brisbane 4000 MHI opened 03/10/2017 14:00:00 PM
jow.wow@gmail.com 25/01/2018 9:00 2018_Beauty_Acq NSW Sydney 2000 AGL delivered 25/01/2018 9:00
jow.wow@gmail.com 14/07/2017 11:00 2017_Fashion_Show NSW Sydney 2000 AGL delivered 14/07/2017 11:00
这有意义吗?
任何人都知道如何在R中执行这种操作? 谢谢
答案 0 :(得分:1)
一种方法可能是使用lubridate
,tidyr
和dplyr
。
方法是首先准备数据。分别阅读response_date
和Time
,然后unite
列到response_date
。然后使用parse_date_time
将这两列转换为datetime格式,这是可选的(因为OP在此日期没有做出任何决定)。最后,应用ifelse
填充delivered_date
。
#Data
df <- read.table(text = "
email_address response_date Time campaign_name state suburb postcode magazine_subs_title response_type
jow.wow@gmail.com 1/02/2018 18:01 2018_Beauty_Acq NSW Sydney 2000 AGL opened
dew.jones@yahoo.id 03/10/2017 14:00:00 2017_Fashion_Show QLD Brisbane 4000 MHI delivered
dew.jones@yahoo.id 03/10/2017 17:00:00 2017_Fashion_Show QLD Brisbane 4000 MHI opened
jow.wow@gmail.com 25/01/2018 9:00 2018_Beauty_Acq NSW Sydney 2000 AGL delivered
jow.wow@gmail.com 14/07/2017 11:00 2017_Fashion_Show NSW Sydney 2000 AGL delivered", header=T, stringsAsFactor = F)
library(lubridate)
library(dplyr)
library(tidyr)
df %>%
unite("response_date", c("response_date", "Time"), sep= " ") %>%
mutate(response_date = parse_date_time(response_date, c("dmy HMS", "dmy HM"))) %>%
mutate(delivered_date = ifelse(grepl("delivered",response_type), as.character(response_date), NA)) %>%
group_by(campaign_name, state, suburb, postcode) %>%
fill(delivered_date) %>% ungroup() %>%
as.data.frame()
Result:
email_address response_date campaign_name state suburb postcode magazine_subs_title response_type delivered_date
#1 jow.wow@gmail.com 2017-07-14 11:00:00 2017_Fashion_Show NSW Sydney 2000 AGL delivered 2017-07-14 11:00:00
#2 dew.jones@yahoo.id 2017-10-03 14:00:00 2017_Fashion_Show QLD Brisbane 4000 MHI delivered 2017-10-03 14:00:00
#3 dew.jones@yahoo.id 2017-10-03 17:00:00 2017_Fashion_Show QLD Brisbane 4000 MHI opened 2017-10-03 14:00:00
#4 jow.wow@gmail.com 2018-02-01 18:01:00 2018_Beauty_Acq NSW Sydney 2000 AGL opened <NA>
#5 jow.wow@gmail.com 2018-01-25 09:00:00 2018_Beauty_Acq NSW Sydney 2000 AGL delivered 2018-01-25 09:00:00