重塑数据帧

时间:2019-01-03 10:24:58

标签: r dataframe

我在将数据从一个表转换到另一个表时遇到一些问题。目前,我使用了2个“ for”循环,这对于我的数据集(> 50,000个变量)来说是很安静的时间。在输出中,存在唯一的“名称”,其中包含该行中名称的所有信息。有什么建议要解决吗?

我的数据集:

 Date_Y Name Amount Score
2010    A    150   1.8
2011    A    120   1.2
2012    A    175   1.3
2010    B    160   1.9
2011    C    120   1.0
2012    C    110   2.0
2013    C    155   3.0

目标数据集:

Name Amount_2010 Amount_2011 Amount_2012 Amount_2013 Score_2010 Score_2011 Score_2012 Score_2013
 "A"  "150"       "120"       "175"       "NA"        "1.8"      "1.2"      "1.3"      "NA"      

 "B"  "160"       "NA"        "NA"        "NA"        "1.9"      "NA"       "NA"       "NA"      
 "C"  "NA"        "120"       "110"       "155"       "NA"       "1"        "2"        "3"  

我当前的代码:

 rm(list = ls())
df <- data.frame(Date_Y=c(2010, 2011, 2012, 2010, 2011, 2012, 2013),
                 Name = c("A", "A", "A", "B", "C", "C", "C"), 
                 Amount = c(150, 120, 175, 160, 120, 110, 155), Score = c(1.8, 1.2, 1.3, 1.9, 1, 2.0, 3))

Name_List <-unique(as.character(df$Name))

new_df = matrix (ncol=9, nrow=0)
colnames(new_df)<-c("Name", "Amount_2010","Amount_2011","Amount_2012","Amount_2013","Score_2010","Score_2011","Score_2012","Score_2013")

My_dates <- c("2010", "2011", "2012", "2013")

for (i in 1:length(Name_List)){
    results <- rep("NA",8)
  for (j in 1:4)
  {
    ifelse(length(df[which(df$Name== Name_List[i] & df$Date_Y==My_dates[j]),3])>0,   results[j]<- df[which(df$Name== Name_List[i] & df$Date_Y==My_dates[j]),3], results[j]<- "NA")
    ifelse(length(df[which(df$Name== Name_List[i] & df$Date_Y==My_dates[j]),4])>0,   results[j+4]<- df[which(df$Name== Name_List[i] & df$Date_Y==My_dates[j]),4], results[j+4]<- "NA")
  }
      new_df =rbind(new_df, c((Name_List[i]), results))
}

new_df

1 个答案:

答案 0 :(得分:4)

我们可以改用reshape2软件包。融化数据框会得到

(melted <- melt(df, c("Date_Y", "Name")))
#    Date_Y Name variable value
# 1    2010    A   Amount 150.0
# 2    2011    A   Amount 120.0
# 3    2012    A   Amount 175.0
# 4    2010    B   Amount 160.0
# 5    2011    C   Amount 120.0
# 6    2012    C   Amount 110.0
# 7    2013    C   Amount 155.0
# 8    2010    A    Score   1.8
# 9    2011    A    Score   1.2
# 10   2012    A    Score   1.3
# 11   2010    B    Score   1.9
# 12   2011    C    Score   1.0
# 13   2012    C    Score   2.0
# 14   2013    C    Score   3.0

以便现在我们可以使用dcast并获取

dcast(melted, Name ~ variable + Date_Y)
#   Name Amount_2010 Amount_2011 Amount_2012 Amount_2013 Score_2010 Score_2011 Score_2012 Score_2013
# 1    A         150         120         175          NA        1.8        1.2        1.3         NA
# 2    B         160          NA          NA          NA        1.9         NA         NA         NA
# 3    C          NA         120         110         155         NA        1.0        2.0          3