在其他条件下,在第二个df中超过一个df的值的次数

时间:2018-04-25 05:16:25

标签: r

我有数据框中的交易历史记录。每笔交易都有三个属性:年份,大小和颜色。

Transactions <- data.frame(Size=c("S","S","S","S","L","L","S","L"),                     
                       Color=c("R","R","R","B","R","B","B","R"),
                       Year=c(1,1,2,1,1,1,2,2))
 Size Color Year  
 S     R     1   
 S     R     1   
 S     R     2   
 S     B     1   
 L     R     1   
 L     B     1   
 S     B     2   
 L     R     2   

所以第一,第二和第三个交易是:SR1,SR1和SR2。这是三个SR交易。第1年两个,第二年一个。

我希望以df的形式报告,对于每种颜色和大小的组合,总结了年份匹配或超过的次数。因此,对于上面的数据,正确的最终报告如下所示。

Size Color Year Count 
 S     R     1   3 (from obs 1,2,3 because there are 3 SRs Yr 1 or later)
 S     R     2   1 (from row 3 of transaction b/c just one SR2)
 S     B     1   2
 S     B     2   1
 L     R     1   2
 L     R     2   1
 L     B     1   1
 L     B     2   0 (Because  LB2  doesn't appear in transactions.

报告中的行序列不是来自事务框架。它是所有尺寸,颜色和年份级别的完整排列。在我真正的问题中,我有一个df,其中包含报告中前三个cols的结构,所以我希望能够将最后一个col附加到它上面。没有最终col的这个df将是:

Report <- data.frame(Size= c("S","S","S","S","L","L","L","L"),
                 Color=c("R","R","B","B","R","R","B","B"),
                 Year= c(1,2,1,2,1,2,1,2)
                 )

我想附加最后一个col,但如果有办法直接从事务中生成它,那也没关系。但由于某些报告组合可能没有出现在交易中,我认为这是不可行的。

1 个答案:

答案 0 :(得分:0)

以下是data.table的解决方案:

Transactions <- data.frame(Size=c("S","S","S","S","L","L","S","L"),                     
                           Color=c("R","R","R","B","R","B","B","R"),
                           Year=c(1,1,2,1,1,1,2,2))
library("data.table")
setDT(Transactions)
allYears <- Transactions[, unique(Year)]
Transactions[, .(Year=allYears, count=sapply(allYears, function(y) sum(Year>=y))), by=.(Size, Color)]
# > Transactions[, .(Year=allYears, count=sapply(allYears, function(y) sum(Year>=y))), by=.(Size, Color)]
#    Size Color Year count
# 1:    S     R    1     3
# 2:    S     R    2     1
# 3:    S     B    1     2
# 4:    S     B    2     1
# 5:    L     R    1     2
# 6:    L     R    2     1
# 7:    L     B    1     1
# 8:    L     B    2     0