根据另一个数据框将计算列添加到数据框

时间:2015-06-15 05:39:23

标签: r

我在R中有2个数据框:

  1. tvNationalSale:每行是一个电视广告展示位置
  2. workingNational:每行是按分钟计算的总网络会话数
  3. 我想在tvNationalSale中添加一个计算列,其中包含广告展示前5分钟内的会话总和。我使用dplyr包进行基本格式化。

    > glimpse(tvNationalSale)
    Observations: 1443
    Variables:
    $ Sort.Date        (fctr) 5/8/2015, 5/8/2015, 5/8/2015, 5/8/2015, 5/8/2015, 5/8/2015, 5/8/2015, 5/8...
    $ Before.Time      (time) 2015-08-05 06:03:00, 2015-08-05 21:12:00, 2015-08-05 08:49:00, 2015-08-05...
    $ Ad.Time          (time) 2015-08-05 06:08:00, 2015-08-05 21:17:00, 2015-08-05 08:54:00, 2015-08-05...
    $ After.Time       (time) 2015-08-05 06:13:00, 2015-08-05 21:22:00, 2015-08-05 08:59:00, 2015-08-05...
    $ Market.Long.Desc (fctr) National, National, National, National, National, National, National, Nat...
    $ Campaign.Name    (fctr) europe-sale, europe-sale, europe-sale, europe-sale, europe-sale, europe-s...
    
    > glimpse(workingNational)
    Observations: 44616
    Variables:
    $ date     (date) 2015-05-01, 2015-05-01, 2015-05-01, 2015-05-01, 2015-05-01, 2015-05-01, 2015-05-0...
    $ hour     (fctr) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
    $ minute   (fctr) 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,...
    $ sessions (dbl) 161, 71, 65, 58, 63, 58, 56, 41, 56, 45, 58, 57, 37, 48, 37, 41, 43, 44, 36, 38, 4...
    $ time     (chr) "01:01:00", "01:02:00", "01:03:00", "01:04:00", "01:05:00", "01:06:00", "01:07:00"...
    $ datetime (time) 2015-05-01 01:01:00, 2015-05-01 01:02:00, 2015-05-01 01:03:00, 2015-05-01 01:04:0...
    

    This example显示了如何在一个数据框中计算周期指标,但我无法弄清楚如何从单独的数据框计算类似指标。

    我尝试了这段代码,我觉得这样做没有用,因为我试图在mutate()命令中引用一个单独的数据框。

    tvNationalSale <- tvNationalSale %>%
    mutate(Before.Sessions=sum(filter(workingNational, datetime>=tvNationalSale$Before.Time & datetime<=tvNationalSale$Ad.Time)$sessions))
    

    有关如何从其他数据框添加计算指标的任何想法?

1 个答案:

答案 0 :(得分:0)

假设您的workingNational数据没有差距或其他不正常现象,您可以在workingNational中查找每个广告时间的位置,然后只记录导致该时间的五个条目:< / p>

indices <- match(tvNationalSale$Ad.Time, workingNational$datetime)
tvNationalSale$fiveMinutesBefore <- rowSums(sapply(1:5, function(x) workingNational$sessions[indices-x]))
head(tvNationalSale)
#               Ad.Time fiveMinutesBefore
# 1 2015-01-03 04:02:00              3126
# 2 2015-01-05 02:57:00              2221
# 3 2015-01-04 14:53:00              4269
# 4 2015-01-07 01:17:00              1916
# 5 2015-01-06 15:37:00              2484
# 6 2015-01-03 14:23:00              3092

数据:

set.seed(144)
workingNational=data.frame(datetime=seq(from=ISOdate(2015, 1, 1), to=ISOdate(2015, 1, 8), by="min"))
workingNational$sessions <- sample(1:1000, nrow(workingNational), replace=TRUE)
tvNationalSale=data.frame(Ad.Time=sample(workingNational$datetime, 100))