我有一个表格的数据集:
df <- data.frame(var1 = c("1976-07-04" , "1980-07-04" , "1984-07-04" ),
var2 = c('d', 'e', 'f'),
freq = 1:3)
我可以使用索引通过以下方式快速扩展此data.frame:
df.expanded <- df[rep(seq_len(nrow(df)), df$freq), ]
但我希望在日期创建序列而不是复制,并让freq告诉我这个的长度。即对于第3行,我可以创建条目来填充分解的data.frame:
seq(as.Date('1984-7-4'), by = 'days', length = 3)
有人可以建议一个快速的方法吗?我的方法是使用各种lapply函数来执行此操作
我使用了Gavin Simpson的答案和之前的解决方案的组合。
ExtendedSeq <- function(df, freq.col, date.col, period = 'month') {
#' An R function to take a data fame that has a frequency col and explode the
#' the dataframe to have that number of rows and based on a sequence.
#' Args:
#' df: A data.frame to be exploded.
#' freq.col: A column variable indicating the number of replicates in the
#' new dataset to make.
#' date.col: A column variable indicating the name or position of the date
#' variable.
#' period: The periodicity to apply to the date.
# Replicate expanded data form
df.expanded <- df[rep(seq_len(nrow(df)), df[[freq.col]]), ]
DateExpand <- function(row, df.ex, freq, col.date, period) {
#' An inner functions to explode a data set and build out days sequence
#' Args:
#' row: Each row of a data set
#' df.ex: A data.frame, to expand
#' freq: Column indicating the number of replicates to make.
#' date: Column indicating the date variable
#' Output:
#' An exploded data set based on a sequence expansion of a date.
times <- df.ex[row, freq]
# period <- can edit in the future if row / data driven.
date.ex <- seq(df.ex[row, col.date], by = "days", length = times)
return(date.ex)
}
dates <- lapply(seq_len(nrow(df)),
FUN = DateExpand,
df.ex = df,
freq = freq.col,
col.date = date.col,
period = period)
df.expanded[[date.col]] <- as.Date(unlist(dates), origin = '1970-01-01')
row.names(df.expanded) <- NULL
return(df.expanded)
}
我个人不喜欢我需要从列表中隐藏日期的方式,并根据此转换提供原点,以防将来发生变化,但我非常感谢这些想法和帮助
答案 0 :(得分:3)
这是一种方式:
extendDF <- function(x) {
foo <- function(i, z) {
times <- z[i, "freq"]
out <- data.frame(seq(z[i, 1], by = "days", length = times),
rep(z[i, 2], times),
rep(z[i, 3], times))
names(out) <- names(z)
out
}
out <- lapply(seq_len(nrow(x)), FUN = foo, z = x)
do.call("rbind", out)
}
迭代索引1:nrow(df)
(即df
的行索引),将内联函数foo
应用于df
的每一行。 foo()
基本上只需将var2
和freq
延长freq
次,并使用seq()
调用来扩展var1
。该函数对列排序,名称等做了一些假设,但你可以根据需要修改它。
唯一的另一点是,在var1
中依次将"Date"
转换为extendDF()
对象而不是每行转换效率要高得多,因此首先执行单次转换,此处使用transform()
:
df <- transform(df, var1 = as.Date(var1))
然后致电extendDF()
extendDF(df)
这给出了:
R> df <- transform(df, var1 = as.Date(var1))
R> extendDF(df)
var1 var2 freq
1 1976-07-04 d 1
2 1980-07-04 e 2
3 1980-07-05 e 2
4 1984-07-04 f 3
5 1984-07-05 f 3
6 1984-07-06 f 3
答案 1 :(得分:1)
简短,不一定快:
library(plyr)
adply(df, 1, summarize, var3 = seq(as.Date(var1), by = "days", length = freq))
# var1 var2 freq var3
# 1 1976-07-04 d 1 1976-07-04
# 2 1980-07-04 e 2 1980-07-04
# 3 1980-07-04 e 2 1980-07-05
# 4 1984-07-04 f 3 1984-07-04
# 5 1984-07-04 f 3 1984-07-05
# 6 1984-07-04 f 3 1984-07-06
答案 2 :(得分:0)
另一个:
df <- data.frame(var1 = c("1976-07-04" , "1980-07-04" , "1984-07-04" ), var2 = c('d', 'e', 'f'), freq = 1:3)
df$id <- seq_len(nrow(df))
expanded <- apply(df[c("id","var1","freq")], MARGIN=1, FUN=function(x) {
result <- seq.Date(as.Date(x["var1"]), length.out = as.integer(x["freq"]), by = "day")
data.frame(id = rep(as.integer(x["id"]), length(result)), result=result)
})
expanded <- do.call(rbind, expanded)
expanded <- plyr:::join(x = expanded, y = df, by="id", type = "left", match = "first")
head(expanded)