如何使用R中的plyr包创建与给定列的级别对应的新数据框列?

时间:2012-06-01 15:02:18

标签: r plyr reshape

我正在寻找一种“优雅”的方式来基本上按一个列变量的级别拆分数据框,然后创建一个新的输出数据框,重新设置为现在删除因子变量并为因子的级别添加新列变量。我可以使用split()方法这样的函数来做到这一点,但这对我来说似乎是一种混乱的方式。我一直在尝试使用plyr包中的melt()和cast()函数来做到这一点,但是没有成功获得我需要的确切输出。

以下是我的数据:

> jumbo.df = read.csv(...)
> head(jumbo.df)
         PricingDate  Name     Rate 
    186  2012-03-05   Type A   2.875  
    187  2012-03-05   Type B   3.250  
    188  2012-03-05   Type C   3.750  
    189  2012-03-05   Type D   3.750  
    190  2012-03-05   Type E   4.500  
    191  2012-03-06   Type A   2.875

我想做的是按变量 name 拆分,删除 Name Rate ,然后输出 Type的列A B型 C型 D型 E型及相应的速率日期为ID的系列:

> head(output.df)
         PricingDate  Type A   Type B    Type C    Type D    Type E 
         2012-03-05    2.875    3.250     3.750     3.750     4.500  
         2012-03-06    2.875    ...

谢谢!

2 个答案:

答案 0 :(得分:4)

不确定我是否帮助您,但是您是否只想将数据重塑为宽幅格式?如果是这样,您必须使用melt(!)包的castreshape函数。 reshape2基本相同。由于您的数据已经处于熔融格式,即长格式,因此单行执行您想要的操作:

df <- read.table(textConnection("PricingDate  Name     Rate 
                                 2012-03-05   TypeA   2.875  
                                 2012-03-05   TypeB   3.250  
                                 2012-03-05   TypeC   3.750  
                                 2012-03-05   TypeD   3.750  
                                 2012-03-05   TypeE   4.500  
                                 2012-03-06   TypeA   2.875"), header=TRUE, row.names=NULL)
library(reshape2)
dcast(df, PricingDate ~ Name)
Using Rate as value column: use value.var to override.
  PricingDate TypeA TypeB TypeC TypeD TypeE
1  2012-03-05 2.875  3.25  3.75  3.75   4.5
2  2012-03-06 2.875    NA    NA    NA    NA

答案 1 :(得分:1)

library(plyr)
library(reshape2)

    data <- structure(list(PricingDate = c("2012-03-05", "2012-03-05", "2012-03-05", 
    "2012-03-05", "2012-03-05", "2012-03-06", "2012-03-06", "2012-03-06", 
    "2012-03-06", "2012-03-06"), Name = c("Type A", "Type B", "Type C", 
    "Type D", "Type E", "Type A", "Type B", "Type C", "Type D", "Type E"
    ), Rate = c(2.875, 3.25, 3.75, 3.75, 4.5, 4.875, 5.25, 6.75, 
    7.75, 8.5)), .Names = c("PricingDate", "Name", "Rate"), class = "data.frame", row.names = c("186", 
    "187", "188", "189", "190", "191", "192", "193", "194", "195"
    ))


    > data
        PricingDate   Name  Rate
    186  2012-03-05 Type A 2.875
    187  2012-03-05 Type B 3.250
    188  2012-03-05 Type C 3.750
    189  2012-03-05 Type D 3.750
    190  2012-03-05 Type E 4.500
    191  2012-03-06 Type A 4.875
    192  2012-03-06 Type B 5.250
    193  2012-03-06 Type C 6.750
    194  2012-03-06 Type D 7.750
    195  2012-03-06 Type E 8.500


    ddply(data, .(PricingDate), function(x) reshape(x, idvar="PricingDate", timevar="Name", direction="wide"))



    PricingDate Rate.Type A Rate.Type B Rate.Type C Rate.Type D
    1  2012-03-05       2.875        3.25        3.75        3.75
    2  2012-03-06       4.875        5.25        6.75        7.75
      Rate.Type E
    1         4.5
    2         8.5