我的数据框称为“数据”需要进行排列,使得1/5之前的条目在按ID和国家/地区分组后将被标记为1/5。如果同一个ID和具有不同日期的国家/地区有多个条目,则1/5之后的条目将具有相同的日期或标记为更早的日期。
我的部分数据框(称为"数据"):
CreatedDate Country Alt. ItemId Qty
19-05-2014 Sweden SFND-023903 30
13-05-2014 Norway SFND-023903 10
23-05-2014 Norway SFND-023903 20
07-04-2014 Sweden SN-073628 1440
28-04-2014 Sweden SN-073628 2400
22-04-2014 Norway SN-073628 40
05-05-2014 Sweden SN-073628 840
23-05-2014 Sweden SN-073628 1559
23-05-2014 Norway SN-073628 40
我想要这样的结果:
CreatedDate Country Alt. ItemId Qty
19-05-2014 Sweden SFND-023903 30
13-05-2014 Norway SFND-023903 30
01-05-2014 Sweden SN-073628 3840
01-05-2014 Norway SN-073628 40
05-05-2014 Sweden SN-073628 2399
23-05-2014 Norway SN-073628 40
我目前的代码:
d1 <- data%.%
mutate(CreatedDate=as.Date(CreatedDate),format="%d-%m-%Y")%.%
filter(CreatedDate>=as.Date("2014-05-01"))%.%
group_by(Alt..ItemId, Country)%.%
summarize(Qty=sum(Qty),CreatedDate=min(CreatedDate))
d2 <- data%.%
mutate(CreatedDate=as.Date(CreatedDate),format="%d-%m-%Y")%.%
filter(CreatedDate<=as.Date("2014-05-01"))%.%
group_by(Alt..ItemId, Country)%.%
summarize(Qty=sum(Qty),CreatedDate=as.Date("2014-05-01"))
d <- rbind(d1,d2)
d <- d[order(d$Alt..ItemId,d$CreatedDate),]
如何将d1和d2中的两个日期参数合并为一个代码?
谢谢。
答案 0 :(得分:2)
以下是使用data.table
的解决方案:
library(data.table)
data <- read.table(text="CreatedDate Country Alt.ItemId Qty
19-05-2014 Sweden SFND-023903 30
13-05-2014 Norway SFND-023903 10
23-05-2014 Norway SFND-023903 20
07-04-2014 Sweden SN-073628 1440
28-04-2014 Sweden SN-073628 2400
22-04-2014 Norway SN-073628 40
05-05-2014 Sweden SN-073628 840
23-05-2014 Sweden SN-073628 1559
23-05-2014 Norway SN-073628 40",header=T)
setDT(data)
现在我们需要更正日期格式:
data[,CreatedDate := as.Date(CreatedDate,"%d-%m-%Y")]
接下来,我们在日期创建一个标记:
data[,tag := CreatedDate > as.Date("2014-05-01")]
最后,查询:
data[,.SD[,list(
if(all(tag)) min(CreatedDate) else as.Date("2014-05-01"),sum(Qty))],
by=c("Country","Alt.ItemId","tag")]
希望这会有所帮助!!
答案 1 :(得分:1)
我不是一个dplyr客户,但在这里有一种方法:
ddt <- read.table(text='CreatedDate Country ItemId Qty
19-05-2014 Sweden SFND-023903 30
13-05-2014 Norway SFND-023903 10
23-05-2014 Norway SFND-023903 20
07-04-2014 Sweden SN-073628 1440
28-04-2014 Sweden SN-073628 2400
22-04-2014 Norway SN-073628 40
05-05-2014 Sweden SN-073628 840
23-05-2014 Sweden SN-073628 1559
23-05-2014 Norway SN-073628 40',header=TRUE)
ddt$CreatedDate <- as.Date(ddt$CreatedDate,format="%d-%m-%Y")
library(dplyr)
ddt%.%
mutate(flag = CreatedDate > as.Date("2014-05-01"))%.%
group_by(ItemId, Country,flag)%.%
summarize(Qty=sum(Qty),
CreatedDate=if(all(!flag))as.Date("2014-05-01")
else min(CreatedDate))
# ItemId Country flag Qty CreatedDate
# 1 SFND-023903 Norway TRUE 30 2014-05-13
# 2 SFND-023903 Sweden TRUE 30 2014-05-19
# 3 SN-073628 Norway FALSE 40 2014-05-01
# 4 SN-073628 Norway TRUE 40 2014-05-23
# 5 SN-073628 Sweden FALSE 3840 2014-05-01
# 6 SN-073628 Sweden TRUE 2399 2014-05-05