找出项目首次在数据集中显示

时间:2015-10-08 06:42:30

标签: r

我有一个数据集ProductTable,我想要返回所有ProductsFamily的日期,这是第一次和最后一次订购的。例子:

ProductTable
 OrderPostingYear OrderPostingMonth OrderPostingDate ProductsFamily Sales QTY
2008               1                       20          R1            5234   1
2008               1                       12          R2            223    2
2009               1                       30          R3            34     1 
2008               2                       1           R1            1634   3
2010               4                       23          R3            224    1 
2009               3                       20          R1            5234   1
2010               7                       12          R2            223    2

结果如下

OrderTime
ProductsFamily OrderStart OrderEnd      SumSales
R1              2008/1/20  2009/3/20    12102
R2              2008/1/12  2010/7/12    446
R3              2009/1/30  2010/4/23    258

我不知道该怎么做。有什么建议?

ProductTable <- structure(list(OrderPostingYear = c(2008L, 2008L, 2009L, 2008L, 
2010L, 2009L, 2010L), OrderPostingMonth = c(1L, 1L, 1L, 2L, 4L, 
3L, 7L), OrderPostingDate = c(20L, 12L, 30L, 1L, 23L, 20L, 12L
), ProductsFamily = structure(c(1L, 2L, 3L, 1L, 3L, 1L, 2L), .Label = c("R1", 
"R2", "R3"), class = "factor"), Sales = c(5234L, 223L, 34L, 1634L, 
224L, 5234L, 223L), QTY = c(1L, 2L, 1L, 3L, 1L, 1L, 2L)), .Names = c("OrderPostingYear", 
"OrderPostingMonth", "OrderPostingDate", "ProductsFamily", "Sales", 
"QTY"), class = "data.frame", row.names = c(NA, -7L))

2 个答案:

答案 0 :(得分:4)

我们也可以使用dplyr/tidyr来执行此操作。我们arrange列,连接年份:日期&#39; unite列,“产品系列”&#39;列,获取firstlast&#39;日期&#39; &#39; Sales&#39;的列和sumsummarise内。

library(dplyr)
library(tidyr)
ProductTable %>% 
   arrange(ProductsFamily, OrderPostingYear, OrderPostingMonth, OrderPostingDate) %>% 
   unite(Date,OrderPostingYear:OrderPostingDate, sep='/') %>% 
   group_by(ProductsFamily) %>%
   summarise(OrderStart=first(Date), OrderEnd=last(Date), SumSales=sum(Sales)) 
# Source: local data frame [3 x 4]

#  ProductsFamily OrderStart  OrderEnd SumSales
#            (fctr)      (chr)     (chr)    (int)   
# 1             R1  2008/1/20 2009/3/20    12102
# 2             R2  2008/1/12 2010/7/12      446
# 3             R3  2009/1/30 2010/4/23      258

答案 1 :(得分:3)

您可以先在新列中设置日期,然后使用data.table包汇总数据(按ID排列第一个和最后一个日期,以及销售额的sum) :

library(data.table)

# First build up the date
ProductTable$date = with(ProductTable, 
                         as.Date(paste(OrderPostingYear, 
                                       OrderPostingMonth, 
                                       OrderPostingDate, sep = "." ), 
                                 format = "%Y.%m.%d"))

# In a second step, aggregate your data
setDT(ProductTable)[,list(OrderStart = sort(date)[1],
                          OrderEnd   = sort(date)[.N],
                          SumSales   = sum(Sales))
                    ,ProductsFamily]

#   ProductsFamily OrderStart   OrderEnd SumSales
#1:             R1 2008-01-20 2009-03-20    12102
#2:             R2 2008-01-12 2010-07-12      446
#3:             R3 2009-01-30 2010-04-23      258