使用lapply在不同因子级别的日期范围之间输出值

时间:2016-09-07 11:53:56

标签: r lapply lubridate

我有2个数据框,一个代表不同商店的每日销售数字(df1),另一个代表每个商店的审核时间(df2)。我需要创建一个新的数据框,显示每次审核前一周的每个网站的销售信息(即df2中的信息)。一些示例数据,首先是一段时间内不同商店的每日销售数据:

override func viewDidLoad() 
{
        super.viewDidLoad()    

     var xMargin:CGFloat = 20.0
     var yTopMargin:CGFloat = 40.0


     var i = 1


     let DataTanks = ["Button 1", "Button 2", "Button 3", "Button 4", "Button 5", "Button 6"]

     for index in 0...5
     {
     let button = UIButton()
     button.tag=index
     var buttonFrame = self.view.frame
     buttonFrame.origin.x += xMargin
     buttonFrame.origin.y += yTopMargin
     buttonFrame.size.width = 200
     buttonFrame.size.height = 200

     button.frame = buttonFrame
     button.layer.cornerRadius = 15.0
     button.layer.shadowColor = UIColor(red: 64/255, green: 64/255, blue: 64/255, alpha: 1).CGColor
     button.layer.shadowOpacity = 0.5
     button.layer.shadowRadius = 5
     button.layer.shadowOffset = CGSizeMake(2.0, 2.0)
     button.layer.zPosition = 10

     button.backgroundColor = UIColor.lightGrayColor()
     button.enabled=false

     if(index==2)
     {
        button.enabled=true
        button.backgroundColor = UIColor(red: 153/255, green: 0/255, blue: 0/255, alpha: 1.0)
     }
     if(index==3)
     {
        button.enabled=true
        button.backgroundColor = UIColor(red: 0/255, green: 0/255, blue: 102/255, alpha: 1.0)
     }

    button.setTitle(DataTanks[index], forState: UIControlState.Normal)   
    button.titleLabel!.numberOfLines = 3;     
    button.addTarget(self, action: #selector(myTestViewController.didTouchButton), forControlEvents: UIControlEvents.TouchUpInside)

     self.view.addSubview(button)


     xMargin+=250.0

     i+=1

        if(i > 3 )
        {
            yTopMargin+=300.0
            xMargin=20.0

            i=1
        }
     }

  }


func didTouchButton(sender:UIButton!) {

        print("Button - \(btnsendtag.tag)")
    }

对于不同商店的每次审核日期:

Dates <- as.data.frame(seq(as.Date("2015/12/30"), as.Date("2016/4/7"),"day"))
Sales <- as.data.frame(matrix(sample(0:50, 30*10, replace=TRUE), ncol=3)) 
df1 <- cbind(Dates,Sales)
colnames(df1) <- c("Dates","Site.A","Site.B","Site.C")

值得注意的是,每个输出中的日期数量不均匀(即在某些商店审核之前可能没有完整的数周信息)。我之前曾问过一个问题来解决类似的问题Creating a dataframe from an lapply function with different numbers of rows。下面显示了一个答案,如果我只考虑来自1家商店的信息,那么这将是一个例子:

Store<- c("Store.A","Store.A","Store.B","Store.C","Store.C")
Audit_Dates <- as.data.frame(as.POSIXct(c("2016/1/4","2016/3/1","2016/2/1","2016/2/1","2016/3/1")))
df2 <- as.data.frame(cbind(Store,Audit_Dates ))
colnames(df2) <- c("Store","Audit_Dates")

但我不知道如何在多个网站上获得此功能。

1 个答案:

答案 0 :(得分:1)

试试这个:

# Renamed vars for my convenience...
colnames(df1) <- c("t","Store.A","Store.B","Store.C")
colnames(df2) <- c("Store","t")

library(tidyr)
library(dplyr)

# Gather df1 so that df1 and df2 have the same format:

df1 = gather(df1, Store, Sales, -t)
head(df1)
           t   Store Sales
1 2015-12-30 Store.A    16
2 2015-12-31 Store.A    24
3 2016-01-01 Store.A     8
4 2016-01-02 Store.A    42
5 2016-01-03 Store.A     7
6 2016-01-04 Store.A    46

# This lapply call does not iterate over actual values, just indexes, which allows
# you to subset the data comfortably:

r <- lapply(1:nrow(df2), function(i) {
   audit.t = df2[i, "t"]                                     #time of audit
   audit.s = df1[, "Store"] == df2[i, "Store"]               #store audited
   df = df1[audit.s, ]                             #data from audited store
   df[, "audited"] = audit.t              #add extra column with audit date

   week_before = difftime(df[, "t"], audit.t - (7*24*3600)) >= 0
   week_audit  = difftime(df[, "t"], audit.t) <= 0

   df[week_before & week_audit, ]
})

这能为您提供正确的子集吗?

另外,总结一下你的结果:

r = do.call("rbind", r) %>% 
  group_by(audited, Store) %>% 
  summarise(sales = sum(Sales))

r

     audited   Store sales
      <time>   <chr> <int>
1 2016-01-04 Store.A    97
2 2016-02-01 Store.B   156
3 2016-02-01 Store.C   226
4 2016-03-01 Store.A   115
5 2016-03-01 Store.C   187