我的数据包括零件号,销售日期用年,月,季度,日表示。可以在同一天出售同一零件,但是发票编号不同,因此每天会有重复的零件编号。数据如下所示:
Year <- c(2016, 2016, 2016, 2017, 2017, 2018, 2018)
Month <- c("Aug", "Sep", "Sep", "Aug", "Sep", "Aug", "Sep")
Day <- c(1, 2, 2, 1, 2, 1, 2)
Revenue <- c(147, 200, 250, 300, 200, 250, 150)
PartNumber <- c("1234", "5678", "5678", "1234", "5678", "5678", "9101")
testdf <- data.frame(Year, Month, Day, Revenue, PartNumber)
> testdf
Year Month Day Revenue PartNumber
1 2016 Aug 1 147 1234
2 2016 Sep 2 200 5678
3 2016 Sep 2 250 5678
4 2017 Aug 1 300 1234
5 2017 Sep 2 200 5678
6 2018 Aug 1 250 5678
7 2018 Sep 2 150 9101
我一直在做的是创建一个新的数据框,然后在“年份”列中添加一个,然后将“收入”列命名为“去年的收入”,如下所示:
testdfCopy <- testdf
testdfCopy$Year <- testdfCopy$Year + 1
colnames(testdfCopy)[4] <- "RevenueLY"
mergeddf <- merge(testdf, testdfCopy, by = c("Year", "Month", "Day", "PartNumber"), all = TRUE)
然后,当我合并它们时,我将第一个数据框的收入和合并的数据框的收入相加,但结果当然不同,因此,我正在寻找一种解决此问题的方法。我的实际数据包含数百万行,因此希望我们能找到一种既不手动也不费时的方法。
> sum(testdf$Revenue)
[1] 1497
> sum(mergeddf$Revenue, na.rm = TRUE)
[1] 1697
最后我得到mergeddf:
> mergeddf
Year Month Day PartNumber Revenue RevenueLY
1 2016 Aug 1 1234 147 NA
2 2016 Sep 2 5678 200 NA
3 2016 Sep 2 5678 250 NA
4 2017 Aug 1 1234 300 147
5 2017 Sep 2 5678 200 200
6 2017 Sep 2 5678 200 250
7 2018 Aug 1 1234 NA 300
8 2018 Aug 1 5678 250 NA
9 2018 Sep 2 5678 NA 200
10 2018 Sep 2 9101 150 NA
11 2019 Aug 1 5678 NA 250
12 2019 Sep 2 9101 NA 150
但是我想要
> finaldf
Year Month Day Revenue PartNumber RevenueLY
1 2016 Aug 1 147 1234 NA
2 2016 Sep 2 200 5678 NA
3 2016 Sep 2 250 5678 NA
4 2017 Aug 1 300 1234 147
5 2017 Sep 2 200 5678 200
6 2018 Aug 1 250 5678 NA
7 2018 Sep 2 150 9101 NA
答案 0 :(得分:0)
这是dplyr可能的选项(为连接表和使用left_join创建索引):
library(dplyr)
testdf <- testdf%>%
mutate(ind=paste0(Year, Month, Day), NextYear= Year+1, ind_next=paste0(NextYear, Month, Day))
testdf%>%
left_join(testdf[,c(4,6)], by=c("ind_next"="ind"))
答案 1 :(得分:0)
基于我们在评论中的讨论,我认为您正在寻找这个:
# use data.table
library(data.table)
setDT(testdf)
# create an ordernum so that the revenue from the first sale of part A in
# month M and year Y will be matched to the first sale of part A in month
# M and year Y+1 -- as requested by the OP
testdf[ , ordernum := 1:.N, by=.(Year, Month, PartNumber)]
# use your approach of copy, adjust year, rename-revenue
testdfCopy <- copy(testdf)
testdfCopy[ , Year := Year + 1]
testdfCopy[ , RevenueLY := Revenue]
# merge
mergeddf <- merge(testdf,
testdfCopy[ , .(Year, Month, ordernum, PartNumber, RevenueLY)],
by=c("Year", "Month", "PartNumber", "ordernum"),
all.x=TRUE)