我正在尝试根据data.frame中包含的值重塑和“扩展”data.frame。以下是我开始使用的数据框架结构:
开始结构:
'data.frame': 9 obs. of 5 variables:
$ Delivery.Location : chr "Henry" "Henry" "Henry" "Henry" ...
$ Price : num 2.97 2.96 2.91 2.85 2.89 ...
$ Trade.Date : Date, format: "2012-01-03" "2012-01-04" "2012-01-05" "2012-01-06" ...
$ Delivery.Start.Date : Date, format: "2012-01-04" "2012-01-05" "2012-01-06" "2012-01-07" ...
$ Delivery.End.Date : Date, format: "2012-01-04" "2012-01-05" "2012-01-06" "2012-01-09" ...
此价格数据来自的市场被称为“次日市场”,因为天然气的实际交付通常<天然气交易后的第二天(即Trade.Date
以上)。我强调通常是,因为在周末和假日会发生例外,在这种情况下,交付期可能是多天(即2-3天)。但是,数据结构提供了明确说明Delivery.Start.Date
和Delivery.End.Date
。
我正在尝试以下列方式重构data.frame以生成一些时间序列图表并进行其他分析:
所需结构:
$ Delivery.Location
$ Trade.Date
$ Delivery.Date <<<-- How do I create this variable?
$ Price
如何根据现有的Delivery.Date
和Delivery.Start.Date
变量创建Delivery.End.Date
变量?
换句话说,2012-01-06 Trade.Date的数据如下所示:
Delivery Location Price Trade.Date Delivery.Start.Date Delivery.End.Date
Henry 2.851322 2012-01-06 2012-01-07 2012-01-09
我想以某种方式“填写”Delivery.Location&amp; 2012-01-08 的价格可以得到这样的结果:
Delivery Location Price Trade.Date Delivery.Date
Henry 2.851322 2012-01-06 2012-01-07
Henry 2.851322 2012-01-06 2012-01-08 <--new record "filled in"
Henry 2.851322 2012-01-06 2012-01-09
以下是我的data.frame的子集示例:
##--------------------------------------------------------------------------------------------
## sample data
##--------------------------------------------------------------------------------------------
df <- structure(list(Delivery.Location = c("Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry"), Price = c(2.96539814293754, 2.95907652120467, 2.9064360152398, 2.85132233314846, 2.89036418816388,2.9655845029802, 2.80773394495413, 2.70207160426346, 2.67173237617745), Trade.Date = structure(c(15342, 15343, 15344, 15345, 15348, 15349, 15350, 15351, 15352), class = "Date"), Delivery.Start.Date = structure(c(15343, 15344, 15345, 15346, 15349, 15350, 15351, 15352, 15353), class = "Date"), Delivery.End.Date = structure(c(15343, 15344, 15345, 15348, 15349, 15350, 15351, 15352, 15356), class = "Date")), .Names = c("Delivery.Location", "Price", "Trade.Date", "Delivery.Start.Date", "Delivery.End.Date"), row.names = c(35L, 150L, 263L, 377L, 493L, 607L, 724L, 838L, 955L), class = "data.frame")
str(df)
##--------------------------------------------------------------------------------------------
## create sequence of Delivery.Dates to potentially use
##--------------------------------------------------------------------------------------------
rng <- range(c(range(df$Delivery.Start.Date), range(df$Delivery.End.Date)))
Delivery.Date <- seq(rng[1], rng[2], by=1)
非常感谢任何协助或一般指示。
答案 0 :(得分:2)
您可以使用ddply
包
plyr
library(plyr)
ddply(
df,
c("Delivery.Location","Trade.Date"),
function(trade)
data.frame(
trade,
Delivery.Date=seq(
from=trade$Delivery.Start.Date,
to=trade$Delivery.End.Date,
by="day")
)
)
当然,您仍然需要实施有关周末,假日等的逻辑。
我还假设Delivery.Location
和Trade.Date
足以识别单笔交易。
答案 1 :(得分:1)
这可以吗?
library(plyr)
lookuptable<-df[,2:3]
Trade.Date<-df[,4]
filluptable1<-as.data.frame(Trade.Date)
Trade.Date<-df[,5]
filluptable2<-as.data.frame(Trade.Date)
myfillstart<- join(filluptable1, lookuptable, by = "Trade.Date")
myfillstart<- rename(myfillstart, c(Trade.Date="Delivery.Start.Date"))
myfillstart<- rename(myfillstart, c(Price="Price.Start.Date"))
myfillend<- join(filluptable2, lookuptable, by = "Trade.Date")
myfillend<- rename(myfillend, c(Trade.Date="Delivery.End.Date"))
myfillend<- rename(myfillend, c(Price="Price.End.Date"))
finaldf<-cbind(df[,1:3],myfillstart,myfillend)
finaldf
Delivery.Location Price Trade.Date Delivery.Start.Date Price.Start.Date Delivery.End.Date Price.End.Date
35 Henry 2.965398 2012-01-03 2012-01-04 2.959077 2012-01-04 2.959077
150 Henry 2.959077 2012-01-04 2012-01-05 2.906436 2012-01-05 2.906436
263 Henry 2.906436 2012-01-05 2012-01-06 2.851322 2012-01-06 2.851322
377 Henry 2.851322 2012-01-06 2012-01-07 NA 2012-01-09 2.890364
493 Henry 2.890364 2012-01-09 2012-01-10 2.965585 2012-01-10 2.965585
607 Henry 2.965585 2012-01-10 2012-01-11 2.807734 2012-01-11 2.807734
724 Henry 2.807734 2012-01-11 2012-01-12 2.702072 2012-01-12 2.702072
838 Henry 2.702072 2012-01-12 2012-01-13 2.671732 2012-01-13 2.671732
955 Henry 2.671732 2012-01-13 2012-01-14 NA 2012-01-17 NA
注意:由于您的位置相同,我没有查找该位置。但是,你也可以这样做。代码看起来有点凌乱。 Here是您可以通过的替代方案。