如何重塑矩阵并将缺失值填充为0

时间:2019-01-29 10:34:21

标签: r reshape data-manipulation

我对R中的矩阵结构操作有疑问,这里我需要先转置矩阵,然后合并month和status列,用0填充缺失值。这里有一个例子,目前我的数据如下。看来很棘手。如果有人可以提供帮助,我将不胜感激。谢谢。

嗨,我的数据如下:

  structure(list(Customer = c("1096261", "1096261", "1169502", 
    "1169502"), Phase = c("2", "3", "1", "2"), Status = c("Ontime", 
    "Ontime", "Ontime", "Ontime"), Amount = c(21216.32, 42432.65, 
    200320.05, 84509.24)), .Names = c("Customer", "Phase", "Status", 
    "Amount"), row.names = c(NA, -4L), class = c("grouped_df", "tbl_df", 
    "tbl", "data.frame"), vars = c("Customer", "Phase"), drop = TRUE, indices 
    = list(
    0L, 1L, 2L, 3L), group_sizes = c(1L, 1L, 1L, 1L), biggest_group_size = 1L, 
    labels = structure(list(
    Customer = c("1096261", "1096261", "1169502", "1169502"), 
    Phase = c("2", "3", "1", "2")), row.names = c(NA, -4L), class = 
    "data.frame", vars = c("Customer", 
    "Phase"), drop = TRUE, .Names = c("Customer", "Phase")))   

我需要具有以下几列的重塑矩阵:
客户Phase1earlyTotal Phase2earlyTotal .... Phase4earlyTotal ... Phase1_ Ontimetotal ... Phase4_Ontimetotal ... Phase1LateTotal_Phase4LateTotal。例如,Phase1earlytotal包括Phase = 1和Status = Early的数量之和。

当前,我使用以下脚本,该脚本不起作用,因为我不知道 如何将“相”和“ Stuatus”柱组合在一起。

   mydata2<-data.table(mydata2,V3,V4)
    mydata2$V4<-NULL
    datacus <- data.frame(mydata2[-1,],stringsAsFactors = F); 
    datacus <- datacus %>% mutate(Phase= as.numeric(Phase),Amount= 
   as.numeric(Amount)) %>%
   complete(Phase = 1:4,fill= list(Amount = 0)) %>% 
   dcast(datacus~V3, value.var = 'Amount',fill = 0) %>% select(Phase, V3) 
   %>%t()

1 个答案:

答案 0 :(得分:0)

我相信您正在寻找这样的想法?

样本数据

df <- structure(list(Customer = c("1096261", "1096261", "1169502", 
                            "1169502"), Phase = c("2", "3", "1", "2"), Status = c("Ontime", 
                                                                                  "Ontime", "Ontime", "Ontime"), Amount = c(21216.32, 42432.65, 
                                                                                                                            200320.05, 84509.24)), .Names = c("Customer", "Phase", "Status", 
                                                                                                                                                              "Amount"), row.names = c(NA, -4L), class = c("grouped_df", "tbl_df", 
                                                                                                                                                                                                           "tbl", "data.frame"), vars = c("Customer", "Phase"), drop = TRUE, indices 
          = list(
            0L, 1L, 2L, 3L), group_sizes = c(1L, 1L, 1L, 1L), biggest_group_size = 1L, 
          labels = structure(list(
            Customer = c("1096261", "1096261", "1169502", "1169502"), 
            Phase = c("2", "3", "1", "2")), row.names = c(NA, -4L), class = 
              "data.frame", vars = c("Customer", 
                                     "Phase"), drop = TRUE, .Names = c("Customer", "Phase")))   

#    Customer Phase Status    Amount
# 1:  1096261     2 Ontime  21216.32
# 2:  1096261     3 Ontime  42432.65
# 3:  1169502     1 Ontime 200320.05
# 4:  1169502     2 Ontime  84509.24

代码

library( data.table )
dcast( setDT( df ), Customer ~ Phase + Status, fun = sum, value.var = "Amount" )[]

输出

#    Customer 1_Ontime 2_Ontime 3_Ontime
# 1:  1096261        0 21216.32 42432.65
# 2:  1169502   200320 84509.24     0.00