我正在寻找一种“优雅”的方式来基本上按一个列变量的级别拆分数据框,然后创建一个新的输出数据框,重新设置为现在删除因子变量并为因子的级别添加新列变量。我可以使用split()方法这样的函数来做到这一点,但这对我来说似乎是一种混乱的方式。我一直在尝试使用plyr包中的melt()和cast()函数来做到这一点,但是没有成功获得我需要的确切输出。
以下是我的数据:
> jumbo.df = read.csv(...)
> head(jumbo.df)
PricingDate Name Rate
186 2012-03-05 Type A 2.875
187 2012-03-05 Type B 3.250
188 2012-03-05 Type C 3.750
189 2012-03-05 Type D 3.750
190 2012-03-05 Type E 4.500
191 2012-03-06 Type A 2.875
我想做的是按变量 name 拆分,删除 Name 和 Rate ,然后输出 Type的列A , B型, C型, D型和 E型及相应的速率日期为ID的系列:
> head(output.df)
PricingDate Type A Type B Type C Type D Type E
2012-03-05 2.875 3.250 3.750 3.750 4.500
2012-03-06 2.875 ...
谢谢!
答案 0 :(得分:4)
不确定我是否帮助您,但是您是否只想将数据重塑为宽幅格式?如果是这样,您必须使用melt
(!)包的cast
和reshape
函数。 reshape2
基本相同。由于您的数据已经处于熔融格式,即长格式,因此单行执行您想要的操作:
df <- read.table(textConnection("PricingDate Name Rate
2012-03-05 TypeA 2.875
2012-03-05 TypeB 3.250
2012-03-05 TypeC 3.750
2012-03-05 TypeD 3.750
2012-03-05 TypeE 4.500
2012-03-06 TypeA 2.875"), header=TRUE, row.names=NULL)
library(reshape2)
dcast(df, PricingDate ~ Name)
Using Rate as value column: use value.var to override.
PricingDate TypeA TypeB TypeC TypeD TypeE
1 2012-03-05 2.875 3.25 3.75 3.75 4.5
2 2012-03-06 2.875 NA NA NA NA
答案 1 :(得分:1)
library(plyr)
library(reshape2)
data <- structure(list(PricingDate = c("2012-03-05", "2012-03-05", "2012-03-05",
"2012-03-05", "2012-03-05", "2012-03-06", "2012-03-06", "2012-03-06",
"2012-03-06", "2012-03-06"), Name = c("Type A", "Type B", "Type C",
"Type D", "Type E", "Type A", "Type B", "Type C", "Type D", "Type E"
), Rate = c(2.875, 3.25, 3.75, 3.75, 4.5, 4.875, 5.25, 6.75,
7.75, 8.5)), .Names = c("PricingDate", "Name", "Rate"), class = "data.frame", row.names = c("186",
"187", "188", "189", "190", "191", "192", "193", "194", "195"
))
> data
PricingDate Name Rate
186 2012-03-05 Type A 2.875
187 2012-03-05 Type B 3.250
188 2012-03-05 Type C 3.750
189 2012-03-05 Type D 3.750
190 2012-03-05 Type E 4.500
191 2012-03-06 Type A 4.875
192 2012-03-06 Type B 5.250
193 2012-03-06 Type C 6.750
194 2012-03-06 Type D 7.750
195 2012-03-06 Type E 8.500
ddply(data, .(PricingDate), function(x) reshape(x, idvar="PricingDate", timevar="Name", direction="wide"))
PricingDate Rate.Type A Rate.Type B Rate.Type C Rate.Type D
1 2012-03-05 2.875 3.25 3.75 3.75
2 2012-03-06 4.875 5.25 6.75 7.75
Rate.Type E
1 4.5
2 8.5