在R中的旧数据帧中填充ddply中的新变量

时间:2014-11-08 21:27:38

标签: r dataframe plyr

我有data.frame看起来像这样(实际上是1M行):

`> DF

             R.DMA.NAMES quarter     daypart allpersons.imp rate                    station  spot.id
1 Wilkes.Barre.Scranton.Hztn  Q22014   afternoon            0.0   30                       WSWB 13048713
2                  Nashville  Q12014   primetime            0.0   50              COM NASHVILLE 11969260
3             Seattle.Tacoma  Q12014   primetime            6.1   51 ESPN SEATTLE, EVERETT ZONE 11898905
4               Jacksonville  Q42013 late fringe            2.3  130          Jacksonville WAWS 11617447
5                    Detroit  Q22014   overnight            0.0    0                       WKBD 12571421
6         South.Bend.Elkhart  Q42013   primetime           11.5  325                       WBND 11741171`
  

dput(DF)

structure(list(R.DMA.NAMES = c("Wilkes.Barre.Scranton.Hztn", 
"Nashville", "Seattle.Tacoma", "Jacksonville", "Detroit", "South.Bend.Elkhart"
), quarter = structure(c(3L, 1L, 1L, 6L, 3L, 6L), .Label = c("Q12014", 
"Q22013", "Q22014", "Q32013", "Q32014", "Q42013"), class = "factor"), 
    daypart = c("afternoon", "primetime", "primetime", "late fringe", 
    "overnight", "primetime"), allpersons.imp = c(0, 0, 6.1, 
    2.3, 0, 11.5), rate = c(30, 50, 51, 130, 0, 325), station = c("WSWB", 
    "COM NASHVILLE", "ESPN SEATTLE, EVERETT ZONE", "Jacksonville WAWS", 
    "WKBD", "WBND"), spot.id = c(13048713L, 11969260L, 11898905L, 
    11617447L, 12571421L, 11741171L)), .Names = c("R.DMA.NAMES", 
"quarter", "daypart", "allpersons.imp", "rate", "station", "spot.id"
), row.names = c(NA, -6L), class = "data.frame")

我正在使用ddply函数执行计算:

ddply(df, .(R.DMA.NAMES, station, quarter), function (x) {
cpi = sum(df$rate) / sum(df$allpersons.imp)
})

这将创建一个新的data.frame,如下所示:

   R.DMA.NAMES                    station quarter        V1
1                    Detroit                       WKBD  Q22014       NaN
2               Jacksonville          Jacksonville WAWS  Q42013 56.521739
3                  Nashville              COM NASHVILLE  Q12014       Inf
4             Seattle.Tacoma ESPN SEATTLE, EVERETT ZONE  Q12014  8.360656
5         South.Bend.Elkhart                       WBND  Q42013 28.260870
6 Wilkes.Barre.Scranton.Hztn                       WSWB  Q22014       Inf

我想要做的是创建一个名为" cpi"的新列。在我原来的df中,即适用的" cpi"值应出现在特定行上。当然,相同的值会重复多次,即包含" Seattle.Tacoma"的每一行都会出现8.36。 R.DMA.NAMES," ESPN SEATTLE,EVERETT ZONE"车站和Q12014的季度。我尝试了几件事,包括:

transform(df, cpi = ddply(df, .(R.DMA.NAMES, station, quarter), function (x) {
cpi = sum(df$rate) / sum(df$allpersons.imp)
})

但这没有用!有人可以解释。 。

1 个答案:

答案 0 :(得分:1)

transform中使用ddply

ddply(df, .(R.DMA.NAMES, station, quarter), 
      transform, cpi = sum(rate) / sum(allpersons.imp))