我已经设法在excel中轻松完成了这项工作,但希望在R:
中完成此操作我有两个数据框:
MediaPlanDF(215 obs,29 var)
我只关心4个变量:
开始日期(航班日期), 结束日期(航班日期), 每日支出, 每日展示次数
OutputDF(35 obs,1 var)
日期:OutputDF [[1]]
从2015年8月31日到10/4/15的35次观察
这就是设置,或至少我如何设置(所有日期格式正确)。
我需要做的是基本上为ouputDF添加2列:Daily Impressions&每日支出
每日展示次数=如果OutputDF中的日期属于MediaPlanDF的开始日期和结束日期,则总结符合该条件的所有#dailyimpressions。
每日展示次数=如果OutputDF中的日期属于MediaPlanDF的开始日期和结束日期,则总结符合该条件的所有#dailyspend。
以下是两个DF的示例:
MediaPlanDF(我担心的部分):
#daysinflight #dailyimpressions #dailyspend Campaign name Campaign ID Campaign flight start date Campaign flight end date
35 392857.1429 1571.428571 A Real Advertiser RAND0M 8/31/2015 10/4/2015
35 85714.28571 428.5714286 A Real Advertiser RAND0M 8/31/2015 10/4/2015
35 142857.1429 714.2857143 A Real Advertiser RAND0M 8/31/2015 10/4/2015
35 62857.14286 942.8571429 A Real Advertiser RAND0M 8/31/2015 10/4/2015
OutputDF
Date
8/31/2015
9/1/2015
9/2/2015
9/3/2015
9/4/2015
9/5/2015
...
10/4/2015
感谢您提前提供任何帮助。
我收到此错误:
as.POSIXlt.character(as.character(x),...)出错: 字符串不是标准的明确格式
这是dput
structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Accuen",
"Amazon", "Bleacher Report", "Brightroll", "Buzzfeed", "CBSi",
"College Humor", "Complex", "ESPN", "GDN", "HULU", "IGN", "Millennial",
"Nativo", "NBA", "NBC Sports", "Pandora", "Reddit", "Spotify",
"Tremor", "TrueX", "Twitch", "Wikia", "Woven", "Yahoo!", "YouTube"
), class = "factor"), daysinflight = c(35L, 35L, 35L, 35L, 35L,
35L), dailyimpressions = c(392857.1429, 85714.28571, 142857.1429,
62857.14286, 17142.85714, 72380.94286), dailyspend = c(1571.428571,
428.5714286, 714.2857143, 942.8571429, 428.5714286, 1085.714286
), Campaign.name = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "A Real Advertiser", class = "factor"),
Campaign.ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "RAND0M", class = "factor"),
startdate = structure(c(16678, 16678, 16678, 16678, 16678,
16678), class = "Date"), enddate = structure(c(16712, 16712,
16712, 16712, 16712, 16712), class = "Date"), Campaign.budget = c(5100206L,
5100206L, 5100206L, 5100206L, 5100206L, 5100206L), Campaign.planned.cost = c(4663350.2,
4663350.2, 4663350.2, 4663350.2, 4663350.2, 4663350.2), Campaign.buy.total = c(4663350.2,
4663350.2, 4663350.2, 4663350.2, 4663350.2, 4663350.2), Supplier = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("ACCUEN", "AMAZON (AMZN MED GP)",
"BRIGHTROLL", "BUZZFEED.COM", "CBS DIGITAL MEDIA", "COLLEGE HUMOR",
"COMPLEX.COM", "ESPN.COM", "GOOGLE DISPLAY NTWK", "HULU",
"IGN.COM", "MILLENNIAL MEDIA", "NATIVO.NET", "NBC.COM", "PANDORA MEDIA, INC.",
"REDDIT.COM", "SPOTIFY.COM", "TREMORMEDIA.COM", "TRUEX MEDIA INC.",
"TURNER MEDIA GROUP", "TWITCH.TV", "WIKIA.COM", "WOVENDIGITAL.COM",
"YAHOO! US", "YOUTUBE, LLC."), class = "factor"), Site = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("ACCUEN", "AMAZON.COM", "BLEACHER REPORT",
"BRIGHTROLL", "BUZZFEED.COM", "CBS INTERACTIVE", "COLLEGEHUMOR",
"COMPLEX", "COMPLEX.COM", "ELECTUS", "ESPN.COM", "GOOGLE DISPLAY NTWK",
"HULU", "IGN.COM", "MILLENNIAL MEDIA", "NATIVO", "NBA", "NBCSPORTS.COM",
"PANDORA", "REDDIT", "SPOTIFY", "TREMOR VIDEO", "TRUEX",
"TWITCH", "WIKIA", "WOVEN", "YAHOO", "YOUTUBE, LLC."), class = "factor"),
Flight.start.date = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("10/3/2015",
"8/31/2015", "9/1/2015", "9/10/2015", "9/11/2015", "9/13/2015",
"9/14/2015", "9/15/2015", "9/16/2015", "9/17/2015", "9/18/2015",
"9/2/2015", "9/20/2015", "9/21/2015", "9/24/2015", "9/25/2015",
"9/27/2015", "9/28/2015", "9/7/2015", "9/9/2015"), class = "factor"),
Flight.end.date = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("10/3/2015",
"10/4/2015", "9/11/2015", "9/13/2015", "9/15/2015", "9/17/2015",
"9/18/2015", "9/2/2015", "9/20/2015", "9/27/2015", "9/30/2015"
), class = "factor"), Cost.method = structure(c(3L, 3L, 3L,
3L, 3L, 3L), .Label = c("CPC", "CPE", "CPM", "Flat", "Free"
), class = "factor"), Rate = c(43, 15, 5, 125, 25, 15), Planned.unit.amount = c(13750000L,
3000000L, 5000000L, 2200000L, 600000L, 2533333L), Cost = c(55000,
15000, 25000, 33000, 15000, 38000), Excluded = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "No", class = "factor"), Company.ID = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "skrt", class = "factor"),
Person.ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "smgboi", class = "factor"),
Exported.by = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "some guy", class = "factor"),
Exported.on = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "2015-10-06 18:53:12, EDT", class = "factor"),
Exported.from = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "media", class = "factor")), .Names = c("site",
"daysinflight", "dailyimpressions", "dailyspend", "Campaign.name",
"Campaign.ID", "startdate", "enddate", "Campaign.budget", "Campaign.planned.cost",
"Campaign.buy.total", "Supplier", "Site", "Placement.name", "Buy.details",
"Positioning", "Unit.dimensions", "Flight.start.date", "Flight.end.date",
"Cost.method", "Rate", "Planned.unit.amount", "Cost", "Excluded",
"Company.ID", "Person.ID", "Exported.by", "Exported.on", "Exported.from"
), row.names = c(NA, 6L), class = "data.frame")
答案 0 :(得分:2)
首先,我们需要确保您的日期格式等正确无误。我假设他们是,我的版本到底。您没有提供dput
并且有一些奇怪的列名称,因此请仔细检查间距和大写字母。
接下来,让我们为图书馆interval
中的每个广告系列构建一个lubridate
对象:
library(lubridate)
MediaPlanDF$interval <- interval(MediaPlanDF$Campaign.flight.1, MediaPlanDF$end.date)
现在我们可以测试outputDF中的每个项目是否在每个区间中,如果是,请求它(您的测试数据包含所有区间中的所有元素):
output <- do.call(rbind, lapply(OutputDF$Date, function(x){
index <- x %within% MediaPlanDF$interval;
list(impressions = sum(MediaPlanDF$dailyimpressions[index]),
spend = sum(MediaPlanDF$dailyspend[index]))}))
我们得到输出的地方:
cbind(output, OutputDF)
impressions spend Date
1 684285.7 3657.143 2015-08-31
2 684285.7 3657.143 2015-09-01
3 684285.7 3657.143 2015-09-02
4 684285.7 3657.143 2015-09-03
5 684285.7 3657.143 2015-09-04
6 684285.7 3657.143 2015-09-05
数据:
OutputDF:
structure(list(Date = structure(c(16678, 16679, 16680, 16681,
16682, 16683), class = "Date")), .Names = "Date", row.names = c(NA,
-6L), class = "data.frame")
MediaPlanDF:
structure(list(daysinflight = c(35L, 35L, 35L, 35L), dailyimpressions = c(392857.1429,
85714.28571, 142857.1429, 62857.14286), dailyspend = c(1571.428571,
428.5714286, 714.2857143, 942.8571429), Campaign.name = structure(c(1L,
1L, 1L, 1L), .Label = "A", class = "factor"), Campaign.ID = structure(c(1L,
1L, 1L, 1L), .Label = "Real", class = "factor"), Campaign.flight = structure(c(1L,
1L, 1L, 1L), .Label = "Advertiser", class = "factor"), start.date = structure(c(1L,
1L, 1L, 1L), .Label = "RAND0M", class = "factor"), Campaign.flight.1 = structure(c(16678,
16678, 16678, 16678), class = "Date"), end.date = structure(c(16712,
16712, 16712, 16712), class = "Date")), .Names = c("daysinflight",
"dailyimpressions", "dailyspend", "Campaign.name", "Campaign.ID",
"Campaign.flight", "start.date", "Campaign.flight.1", "end.date"
), row.names = c(NA, -4L), class = "data.frame")