我想延伸一些被压平的记录
我有一张这样的表
Store Min(Date) Max (Date) Status
NYC1 1/1/2013 2/1/2013 Open
NYC1 2/2/2013 2/3/2013 Closed for Inspection
Boston1 1/1/2013 2/5/2013 Open
我希望将其扩展为
形式Store Date Status
NYC1 1/1/2013 Open
NYC1 1/2/2013 Open
.....
NYC1 2/2/2013 Closed for Inspection
NYC1 2/3/2013 Closed for Inspection
....
Boston1 1/1/2013 Open
我知道我总是可以为此编写循环,但在尝试之前,我想问一下是否有任何快速而肮脏的方法呢?
答案 0 :(得分:4)
这是一种方法:
mydf <- read.table(header = TRUE, stringsAsFactors=FALSE,
text = "Store Min(Date) Max(Date) Status
NYC1 1/1/2013 2/1/2013 Open
NYC1 2/2/2013 2/3/2013 'Closed for Inspection'
Boston1 1/1/2013 2/5/2013 Open")
names(mydf) <- c("store", "min.date", "max.date", "status")
mydf$min.date <- as.Date(mydf$min.date, format = "%m/%d/%Y")
mydf$max.date <- as.Date(mydf$max.date, format = "%m/%d/%Y")
mydf
# store min.date max.date status
# 1 NYC1 2013-01-01 2013-02-01 Open
# 2 NYC1 2013-02-02 2013-02-03 Closed for Inspection
# 3 Boston1 2013-01-01 2013-02-05 Open
使用该信息“展开”data.frame
并生成“min.date”和“max.date”之间的日期序列。另外,data.frame
的子集只返回“store”,“date”(我们的新变量)和“status”变量。
SEQ <- mydf$max.date - mydf$min.date + 1
mydf2 <- mydf[rep(row.names(mydf), SEQ), ]
mydf2$date <- mydf2$min.date + sequence(SEQ)-1
mydf2 <- mydf2[c("store", "date", "status")]
以下是输出样本。
head(mydf2)
# store date status
# 1 NYC1 2013-01-01 Open
# 1.1 NYC1 2013-01-02 Open
# 1.2 NYC1 2013-01-03 Open
# 1.3 NYC1 2013-01-04 Open
# 1.4 NYC1 2013-01-05 Open
# 1.5 NYC1 2013-01-06 Open
tail(mydf2)
# store date status
# 3.30 Boston1 2013-01-31 Open
# 3.31 Boston1 2013-02-01 Open
# 3.32 Boston1 2013-02-02 Open
# 3.33 Boston1 2013-02-03 Open
# 3.34 Boston1 2013-02-04 Open
# 3.35 Boston1 2013-02-05 Open
您可以使用by
来验证我们是否正确执行了所有操作:
> with(mydf2, by(date, list(store, status), FUN = range))
: Boston1
: Closed for Inspection
NULL
-----------------------------------------------------------------
: NYC1
: Closed for Inspection
[1] "2013-02-02" "2013-02-03"
-----------------------------------------------------------------
: Boston1
: Open
[1] "2013-01-01" "2013-02-05"
-----------------------------------------------------------------
: NYC1
: Open
[1] "2013-01-01" "2013-02-01"
答案 1 :(得分:2)
使用data.table
进行语法优雅(并假设@Ananda进行预处理
mydf <- read.table(header = TRUE, stringsAsFactors=FALSE,
text = "Store Min(Date) Max(Date) Status
NYC1 1/1/2013 2/1/2013 Open
NYC1 2/2/2013 2/3/2013 'Closed for Inspection'
Boston1 1/1/2013 2/5/2013 Open")
names(mydf) <- c("store", "min.date", "max.date", "status")
mydf$min.date <- as.Date(mydf$min.date, format = "%m/%d/%Y")
mydf$max.date <- as.Date(mydf$max.date, format = "%m/%d/%Y")
library(data.table)
DT <- data.table(mydf)
DT[, list(dates = seq(min.date,max.date, by = 1)) , by = list(store,status)]
答案 2 :(得分:0)
绿色恶魔
鉴于你的问题有reshape包标签,我能想到的最简单的事情就是简单地使用融合功能。让我们调用你的data.frame'foo'。下面的代码应该可以满足您的需求。
library(reshape)
foo.melt<-melt(foo, id.vars=c('Store','Status'))
请注意,这将使用min.date和max.date创建一个额外的列'variable'。
干杯,
丹尼