总结数据

时间:2014-06-05 06:41:07

标签: r dataframe nested

我的数据框称为“数据”需要进行排列,使得1/5之前的条目在按ID和国家/地区分组后将被标记为1/5。如果同一个ID和具有不同日期的国家/地区有多个条目,则1/5之后的条目将具有相同的日期或标记为更早的日期。

我的部分数据框(称为"数据"):

CreatedDate Country Alt. ItemId Qty
19-05-2014  Sweden  SFND-023903 30
13-05-2014  Norway  SFND-023903 10
23-05-2014  Norway  SFND-023903 20
07-04-2014  Sweden  SN-073628   1440
28-04-2014  Sweden  SN-073628   2400
22-04-2014  Norway  SN-073628   40
05-05-2014  Sweden  SN-073628   840
23-05-2014  Sweden  SN-073628   1559
23-05-2014  Norway  SN-073628   40

我想要这样的结果:

CreatedDate Country Alt. ItemId Qty
19-05-2014  Sweden  SFND-023903 30
13-05-2014  Norway  SFND-023903 30
01-05-2014  Sweden  SN-073628   3840
01-05-2014  Norway  SN-073628   40
05-05-2014  Sweden  SN-073628   2399
23-05-2014  Norway  SN-073628   40

我目前的代码:

d1 <- data%.% 
  mutate(CreatedDate=as.Date(CreatedDate),format="%d-%m-%Y")%.% 
  filter(CreatedDate>=as.Date("2014-05-01"))%.%
  group_by(Alt..ItemId, Country)%.% 
  summarize(Qty=sum(Qty),CreatedDate=min(CreatedDate))

d2 <- data%.% 
  mutate(CreatedDate=as.Date(CreatedDate),format="%d-%m-%Y")%.% 
  filter(CreatedDate<=as.Date("2014-05-01"))%.%
  group_by(Alt..ItemId, Country)%.% 
  summarize(Qty=sum(Qty),CreatedDate=as.Date("2014-05-01"))

d <- rbind(d1,d2)
d <- d[order(d$Alt..ItemId,d$CreatedDate),]

如何将d1和d2中的两个日期参数合并为一个代码?

谢谢。

2 个答案:

答案 0 :(得分:2)

以下是使用data.table的解决方案:

library(data.table)

data <- read.table(text="CreatedDate Country Alt.ItemId Qty
19-05-2014  Sweden  SFND-023903 30
13-05-2014  Norway  SFND-023903 10
23-05-2014  Norway  SFND-023903 20
07-04-2014  Sweden  SN-073628   1440
28-04-2014  Sweden  SN-073628   2400
22-04-2014  Norway  SN-073628   40
05-05-2014  Sweden  SN-073628   840
23-05-2014  Sweden  SN-073628   1559
23-05-2014  Norway  SN-073628   40",header=T)

setDT(data)

现在我们需要更正日期格式:

data[,CreatedDate := as.Date(CreatedDate,"%d-%m-%Y")]

接下来,我们在日期创建一个标记:

data[,tag := CreatedDate > as.Date("2014-05-01")]

最后,查询:

data[,.SD[,list(
  if(all(tag)) min(CreatedDate) else as.Date("2014-05-01"),sum(Qty))],
by=c("Country","Alt.ItemId","tag")]

希望这会有所帮助!!

答案 1 :(得分:1)

我不是一个dplyr客户,但在这里有一种方法:

ddt <- read.table(text='CreatedDate Country ItemId Qty
19-05-2014  Sweden  SFND-023903 30
13-05-2014  Norway  SFND-023903 10
23-05-2014  Norway  SFND-023903 20
07-04-2014  Sweden  SN-073628   1440
28-04-2014  Sweden  SN-073628   2400
22-04-2014  Norway  SN-073628   40
05-05-2014  Sweden  SN-073628   840
23-05-2014  Sweden  SN-073628   1559
23-05-2014  Norway  SN-073628   40',header=TRUE)

ddt$CreatedDate <-  as.Date(ddt$CreatedDate,format="%d-%m-%Y")


library(dplyr)
ddt%.% 
  mutate(flag = CreatedDate > as.Date("2014-05-01"))%.%
  group_by(ItemId, Country,flag)%.% 
  summarize(Qty=sum(Qty),
            CreatedDate=if(all(!flag))as.Date("2014-05-01") 
            else min(CreatedDate))

#   ItemId       Country  flag  Qty CreatedDate
# 1 SFND-023903  Norway  TRUE   30  2014-05-13
# 2 SFND-023903  Sweden  TRUE   30  2014-05-19
# 3   SN-073628  Norway FALSE   40  2014-05-01
# 4   SN-073628  Norway  TRUE   40  2014-05-23
# 5   SN-073628  Sweden FALSE 3840  2014-05-01
# 6   SN-073628  Sweden  TRUE 2399  2014-05-05