以下是输入和输出。我想将数据帧从输入格式转换为输出格式。
我现在写了一段代码。它完成了这项工作。但我觉得它效率很低。有没有更好的包或功能可以处理这个?
mycode的:
#create a output data frame to be apended later
output = data.frame(id1 = character(0), id2 = character(0), dates = character(0))
# for loop to get all possible combiation of dates
for (i in c(1:nrow(input))) {
end = as.Date('2016-07-18')
len = as.numeric(end-input$min_date[i])
output = rbind(output, as.data.frame(cbind(
pid = rep(input$id1[i],len),
cid = rep(input$id2[i],len),
dates = as.character(seq(input$min_date[i], end, by='day'))
)
)
)
}
输入:
+------+--------+------------+------------+
| id1 | id2 | min_date | max_date |
+------+--------+------------+------------+
| 3575 | 155443 | 2012-06-18 | 2016-07-18 |
| 3575 | 155450 | 2012-06-12 | 2016-07-18 |
+------+--------+------------+------------+
输出:
+------+--------+------------+
| id1 | id2 | dates |
+------+--------+------------+
| 3575 | 155443 | 2012-06-18 |
| 3575 | 155443 | 2012-06-19 |
| 3575 | 155443 | 2012-06-20 |
| 3575 | 155443 | .. |
| 3575 | 155443 | … |
| 3575 | 155443 | 2016-07-18 |
| | | |
| 3575 | 155450 | 2012-06-12 |
| 3575 | 155450 | 2012-06-13 |
| 3575 | 155450 | 2012-06-14 |
| 3575 | 155450 | … |
| 3575 | 155450 | … |
| 3575 | 155450 | 2016-07-18 |
+------+--------+------------+
答案 0 :(得分:2)
假设'min_date / max_date'列是Date
类,我们使用Map
来获取每个'min_date'的序列,并在list
中使用相应的'max_date',使用list
元素的行数复制'df1'行的序列,通过基于'i1'扩展数据集创建data.frame
,并通过连接'lst'来创建'日期' '元素。
lst <- Map(function(x, y) seq(x,y, by = "1 day"), df1$min_date, df1$max_date)
i1 <- rep(1:nrow(df1), lengths(lst))
data.frame(df1[i1,-3], dates = do.call("c", lst))
或者如果我们使用dplyr
library(dplyr)
df1 %>%
rowwise() %>%
do(data.frame(.[1:2], date = seq(.$min_date, .$max_date, by = "1 day")))
或者使用data.table
,我们可以在一行代码中执行此操作
library(data.table)
setDT(df1)[,.(date = seq(min_date, max_date, by = "1 day")) ,.(id1, id2)]
答案 1 :(得分:0)
您可以使用dplyr
和splitstackshape
个套件
library(dplyr)
library(splitstackshape)
df %>%
group_by(id1, id2) %>%
mutate(dates = paste(seq(as.Date(min_date),as.Date(max_date),by = 1), collapse = ',')) %>%
select(-c(min_date, max_date)) %>%
cSplit('dates', ',', 'long')