将min()和max()值转换为值范围

时间:2013-04-22 18:16:47

标签: r dataframe reshape2

我想延伸一些被压平的记录

我有一张这样的表

Store     Min(Date)     Max (Date)     Status

NYC1       1/1/2013      2/1/2013      Open
NYC1       2/2/2013      2/3/2013      Closed for Inspection
Boston1    1/1/2013      2/5/2013      Open

我希望将其扩展为

形式
Store       Date        Status

 NYC1       1/1/2013     Open
 NYC1       1/2/2013     Open
 .....
 NYC1       2/2/2013     Closed for Inspection
 NYC1       2/3/2013     Closed for Inspection
 ....
 Boston1    1/1/2013     Open

我知道我总是可以为此编写循环,但在尝试之前,我想问一下是否有任何快速而肮脏的方法呢?

3 个答案:

答案 0 :(得分:4)

这是一种方法:

读入您的数据并将日期转换为实际日期变量:

mydf <- read.table(header = TRUE, stringsAsFactors=FALSE, 
text = "Store     Min(Date)     Max(Date)     Status
NYC1       1/1/2013      2/1/2013      Open
NYC1       2/2/2013      2/3/2013      'Closed for Inspection'
Boston1    1/1/2013      2/5/2013      Open")

names(mydf) <- c("store", "min.date", "max.date", "status")
mydf$min.date <- as.Date(mydf$min.date, format = "%m/%d/%Y")
mydf$max.date <- as.Date(mydf$max.date, format = "%m/%d/%Y")
mydf
#     store   min.date   max.date                status
# 1    NYC1 2013-01-01 2013-02-01                  Open
# 2    NYC1 2013-02-02 2013-02-03 Closed for Inspection
# 3 Boston1 2013-01-01 2013-02-05                  Open

计算“min.date”和“max.date”

之间的天数差异

使用该信息“展开”data.frame并生成“min.date”和“max.date”之间的日期序列。另外,data.frame的子集只返回“store”,“date”(我们的新变量)和“status”变量。

SEQ <- mydf$max.date - mydf$min.date + 1
mydf2 <- mydf[rep(row.names(mydf), SEQ), ]
mydf2$date <- mydf2$min.date + sequence(SEQ)-1

mydf2 <- mydf2[c("store", "date", "status")]

以下是输出样本。

head(mydf2)
#     store       date status
# 1    NYC1 2013-01-01   Open
# 1.1  NYC1 2013-01-02   Open
# 1.2  NYC1 2013-01-03   Open
# 1.3  NYC1 2013-01-04   Open
# 1.4  NYC1 2013-01-05   Open
# 1.5  NYC1 2013-01-06   Open
tail(mydf2)
#        store       date status
# 3.30 Boston1 2013-01-31   Open
# 3.31 Boston1 2013-02-01   Open
# 3.32 Boston1 2013-02-02   Open
# 3.33 Boston1 2013-02-03   Open
# 3.34 Boston1 2013-02-04   Open
# 3.35 Boston1 2013-02-05   Open

您可以使用by来验证我们是否正确执行了所有操作:

> with(mydf2, by(date, list(store, status), FUN = range))
: Boston1
: Closed for Inspection
NULL
----------------------------------------------------------------- 
: NYC1
: Closed for Inspection
[1] "2013-02-02" "2013-02-03"
----------------------------------------------------------------- 
: Boston1
: Open
[1] "2013-01-01" "2013-02-05"
----------------------------------------------------------------- 
: NYC1
: Open
[1] "2013-01-01" "2013-02-01"

答案 1 :(得分:2)

使用data.table进行语法优雅(并假设@Ananda进行预处理

mydf <- read.table(header = TRUE, stringsAsFactors=FALSE, 
text = "Store     Min(Date)     Max(Date)     Status
NYC1       1/1/2013      2/1/2013      Open
NYC1       2/2/2013      2/3/2013      'Closed for Inspection'
Boston1    1/1/2013      2/5/2013      Open")

names(mydf) <- c("store", "min.date", "max.date", "status")
mydf$min.date <- as.Date(mydf$min.date, format = "%m/%d/%Y")
mydf$max.date <- as.Date(mydf$max.date, format = "%m/%d/%Y")

library(data.table)
DT <- data.table(mydf)
DT[, list(dates = seq(min.date,max.date, by = 1)) , by = list(store,status)]

答案 2 :(得分:0)

绿色恶魔

鉴于你的问题有reshape包标签,我能想到的最简单的事情就是简单地使用融合功能。让我们调用你的data.frame'foo'。下面的代码应该可以满足您的需求。

library(reshape)
foo.melt<-melt(foo, id.vars=c('Store','Status'))

请注意,这将使用min.date和max.date创建一个额外的列'variable'。

干杯,

丹尼