在R

时间:2016-04-27 00:01:00

标签: r transform

面对工作中的采购订单,收货和发票数据的问题。当前数据(下面的R代码)的问题在于它根据MD.Doc.Type将PO旅程分成两行。

PO.Document.Number<-c("Doc1","Doc1","Doc2","Doc2")
PO.Document.date<-c("12.01.2016","12.01.2016","03.01.2016","03.01.2016")
PO.Vendor<-c("200001","200001","200002","200002")
PO.Vendor.Name<-c("Vendor1","Vendor1","Vendor2","Vendor2")
BuyerCode<-c("G01","G01","G02","G02")
MD.Doc.Number<-c("500087","510035","500099","510050")
MD.Doc.Type<-c("GR","INV","GR","INV")
MD.Posting.Date<-c("04.03.2016","04.03.2016","09.03.2016","15.03.2016")
MD.Amount<-c("-67.5","80","-420.39","-420.29")
df<-data.frame(PO.Document.Number,PO.Document.date,PO.Vendor,PO.Vendor.Name,BuyerCode,MD.Doc.Number,MD.Doc.Type,MD.Posting.Date,MD.Amount)
rm(list=ls(-df))

我需要将每个PO合并为一行,如下所示(将4行合并为2行)。原始数据中的前两行和后两行包含相同的PO信息(PO.Numb,Date,Vendor等)。在转换中,df“Posting.Date”根据df1中“MD.Doc.Type”中的值变为“GR过帐日期”或“INV过帐日期”,类似于“MD.Amount”和“Doc。号”。

PO-Document Number  PO-Document date    PO-Vendor   PO-Vendor-Name  BuyerCode   GR Number   GR Posting Date GR-Amount   Inv Number  Inv Posting Date    Inv Amount
Doc1    12.01.2016  200001  Vendor1 G01 500087  04.03.2016  -67.5   510035  04.03.2016  80
Doc2    03.01.2016  200002  Vendor2 G02 500099  09.03.2016  -420.39 510050  15.03.2016  -420.29

到目前为止我已经尝试过了:

df1<-cast(data=df, PO.Document.Number+PO.Document.date+PO.Vendor+PO.Vendor.Name+BuyerCode+MD.Doc.Number+MD.Posting.Date ~ MD.Doc.Type)

但我不确定如何从这里开始。谢谢你的帮助。

1 个答案:

答案 0 :(得分:1)

考虑绑定到不同数据帧的多个dcast:

library(reshape2)

...

# FIXED COLUMNS
fixdf <- unique(df[c('PO.Document.Number', 'PO.Document.date',
                     'PO.Vendor', 'PO.Vendor.Name', 'BuyerCode')])

# CASTED COLUMNS
finaldf <- cbind(fixdf,
                 dcast(df, PO.Document.Number ~ MD.Doc.Type,
                      value.var = 'MD.Doc.Number')[,2:3],
                 dcast(df, PO.Document.Number ~ MD.Doc.Type,
                       value.var = 'MD.Posting.Date')[,2:3],
                 dcast(df, PO.Document.Number ~ MD.Doc.Type,
                       value.var = 'MD.Amount')[,2:3])

# RENAMING CASTED COLUMNS
names(finaldf)[6:11] <- c('GR Number', 'Inv Number',
                          'GR Posting.Date', 'Inv Posting.Date',
                          'GR Amount', 'Inv Amount')

#  PO.Document.Number PO.Document.date PO.Vendor PO.Vendor.Name BuyerCode
# 1              Doc1       12.01.2016    200001        Vendor1      G01
# 2              Doc2       03.01.2016    200002        Vendor2      G02
#   GR Number Inv Number GR Posting.Date Inv Posting.Date  GR Amount Inv Amount
# 1    500087     510035      04.03.2016       04.03.2016      -67.5         80
# 2    500099     510050      09.03.2016       15.03.2016    -420.39    -420.29