我有数据框中的交易历史记录。每笔交易都有三个属性:年份,大小和颜色。
Transactions <- data.frame(Size=c("S","S","S","S","L","L","S","L"),
Color=c("R","R","R","B","R","B","B","R"),
Year=c(1,1,2,1,1,1,2,2))
Size Color Year
S R 1
S R 1
S R 2
S B 1
L R 1
L B 1
S B 2
L R 2
所以第一,第二和第三个交易是:SR1,SR1和SR2。这是三个SR交易。第1年两个,第二年一个。
我希望以df的形式报告,对于每种颜色和大小的组合,总结了年份匹配或超过的次数。因此,对于上面的数据,正确的最终报告如下所示。
Size Color Year Count
S R 1 3 (from obs 1,2,3 because there are 3 SRs Yr 1 or later)
S R 2 1 (from row 3 of transaction b/c just one SR2)
S B 1 2
S B 2 1
L R 1 2
L R 2 1
L B 1 1
L B 2 0 (Because LB2 doesn't appear in transactions.
报告中的行序列不是来自事务框架。它是所有尺寸,颜色和年份级别的完整排列。在我真正的问题中,我有一个df,其中包含报告中前三个cols的结构,所以我希望能够将最后一个col附加到它上面。没有最终col的这个df将是:
Report <- data.frame(Size= c("S","S","S","S","L","L","L","L"),
Color=c("R","R","B","B","R","R","B","B"),
Year= c(1,2,1,2,1,2,1,2)
)
我想附加最后一个col,但如果有办法直接从事务中生成它,那也没关系。但由于某些报告组合可能没有出现在交易中,我认为这是不可行的。
答案 0 :(得分:0)
以下是data.table
的解决方案:
Transactions <- data.frame(Size=c("S","S","S","S","L","L","S","L"),
Color=c("R","R","R","B","R","B","B","R"),
Year=c(1,1,2,1,1,1,2,2))
library("data.table")
setDT(Transactions)
allYears <- Transactions[, unique(Year)]
Transactions[, .(Year=allYears, count=sapply(allYears, function(y) sum(Year>=y))), by=.(Size, Color)]
# > Transactions[, .(Year=allYears, count=sapply(allYears, function(y) sum(Year>=y))), by=.(Size, Color)]
# Size Color Year count
# 1: S R 1 3
# 2: S R 2 1
# 3: S B 1 2
# 4: S B 2 1
# 5: L R 1 2
# 6: L R 2 1
# 7: L B 1 1
# 8: L B 2 0