桶中的数据分类

时间:2016-12-31 12:50:44

标签: r pivot

我有一个名为Data的数据框,它包含以下元素:

Model  Garage  City  Unit.Price Invoice.Date  Components    
Hyundai  A      NY     500        31/12/2016   HL   
Honda    B      NJ     700        31/12/2016   TL     
Porsche  A      NY     800        30/12/2016   TL    
BMW      B      NJ     800        30/12/2016   HL   
BMW      A      NJ     700        31/12/2016   HL   
Porsche  B      NY     800        30/12/2016   TL   
Honda    A      NY     400        30/12/2016   TL  
Honda    A      NY     500        30/12/2016   HL  
Honda    B      NY     600        30/12/2016   HL  
Honda    A      NY     200        29/12/2016   TL  
Honda    A      NY     300        29/12/2016   HL  

我希望将数据的输出分解为按Invoice.Date排序的汽车,以便首先捕获当前成本。

Ex:Honda

Components    GarageA   GarageB    
HL             500          600    
TL             400          700 

这就是我的开始:

Category <- as.data.frame(c("BMW","Honda","Porsche","Hyundai"))

for(i in 1:nrow(Category))
{
  m <- Category[i,1]
  X <- subset(Data,Model==m)
  X <- Data[order(Data$Invoice.Date,decreasing = T),]
  Pivot_A<-dcast(X,Name~Garage,value.var = "Unit.Price",function(x) length((x)))
  write.csv(Pivot,file = paste(X,"Cars.csv",sep = "_"))
 }

我得到的唯一问题是映射正确的单价。是否有任何代码或函数可以使用dcast执行此操作? dcast的选项为sumcount。如果我想要确切金额而不是sumaverage

3 个答案:

答案 0 :(得分:1)

你可以通过以下方式做到:

require(tidyverse) # dplyr would be enough...
dat %>% 
  mutate(Invoice.Date = as.Date(Invoice.Date, "%d/%m/%Y")) %>% 
  group_by(Model, Garage, Components) %>% 
  summarise(Unit.Price = first(Unit.Price, order_by = Invoice.Date)) %>% 
  spread(Garage, Unit.Price, sep = "")

这给了你:

    Model Components GarageA GarageB
*   <chr>      <chr>   <int>   <int>
1     BMW         HL     700     800
2   Honda         HL     300     600
3   Honda         TL     200     700
4 Hyundai         HL     500      NA
5 Porsche         TL     800     800

现在我不确定如何解释你问题中的闯入汽车。您可以将上述内容(%>%)导入

  • split(.$Model)获取一个列表,其中每个list-element代表一个 Model
  • nest(-Model)获得嵌套的tibble ......

答案 1 :(得分:0)

我们可以使用.becomes中的dcast执行此操作。转换&#39; data.frame&#39;到&#39; data.table&#39; (data.table),setDT(df1)&#39; Invoice.Date&#39;来自&#39; long&#39;以及order广泛的&#39;使用dcast同时指定dcast以仅选择第一个观察

fun.aggregate

答案 2 :(得分:0)

考虑R的最佳套餐base

library(base)  # COMPLETELY REDUNDANT =)

df <- df[with(df, order(Invoice.Date)),]
dfagg <- aggregate(Unit.Price ~ Model + Components + Garage, df, function(i) tail(i)[1])
dfwide <- reshape(dfagg, timevar='Garage', idvar=c('Model', 'Components'), direction="wide")
names(dfwide) <- gsub("Unit.Price.", "Garage", names(dfwide))

#     Model Components GarageA GarageB
# 1     BMW         HL     700     800
# 2   Honda         HL     300     600
# 3 Hyundai         HL     500      NA
# 4   Honda         TL     200     700
# 5 Porsche         TL     800     800