Question

考虑表格

的数据框架

       idnum      start        end
1993.1    17 1993-01-01 1993-12-31
1993.2    17 1993-01-01 1993-12-31
1993.3    17 1993-01-01 1993-12-31

start且end类型为Date

 $ idnum : int  17 17 17 17 27 27
 $ start : Date, format: "1993-01-01" "1993-01-01" "1993-01-01" "1993-01-01" ...
 $ end   : Date, format: "1993-12-31" "1993-12-31" "1993-12-31" "1993-12-31" ...

我想创建一个 new 数据框，而不是start和end之间每个月（包括边界）的每一行的每月观察结果：

所需输出

idnum       month
   17  1993-01-01
   17  1993-02-01
   17  1993-03-01
...
   17  1993-11-01
   17  1993-12-01

我不确定month应具有什么格式，我会在某个时候想要按idnum，month分组，以便对其余数据集进行回归分析。

到目前为止，对于每一行，seq(from=test[1,'start'], to=test[1, 'end'], by='1 month')给出了正确的序列 - 但是一旦我尝试将其应用于整个数据框，它就无法工作：

> foo <- apply(test, 1, function(x) seq(x['start'], to=x['end'], by='1 month'))
Error in to - from : non-numeric argument to binary operator

Answer 1

使用data.table：

require(data.table) ## 1.9.2+
setDT(df)[ , list(idnum = idnum, month = seq(start, end, by = "month")), by = 1:nrow(df)]

# you may use dot notation as a shorthand alias of list in j:
setDT(df)[ , .(idnum = idnum, month = seq(start, end, by = "month")), by = 1:nrow(df)]

setDT将df转换为data.table。然后，对于每一行by = 1:nrow(df)，我们会根据需要创建idnum和month。

Answer 2

使用dplyr：

test %>%
    group_by(idnum) %>%
    summarize(start=min(start),end=max(end)) %>%
    do(data.frame(idnum=.$idnum, month=seq(.$start,.$end,by="1 month")))

请注意，此处我不为每一行生成start和end之间的序列，而是min(start)和max(end)之间的每个idnum的序列1}}。如果你想要前者：

test %>%
    rowwise() %>%
    do(data.frame(idnum=.$idnum, month=seq(.$start,.$end,by="1 month")))

Answer 3

export class SelectedList { constructor( public attributeName: any, public value: any, public date: any, public checked: boolean ) {} }回答

数据

tidyverse

回答和输出

df <- structure(list(idnum = c(17L, 17L, 17L), start = structure(c(8401, 
8401, 8401), class = "Date"), end = structure(c(8765, 8765, 8765
), class = "Date")), class = "data.frame", .Names = c("idnum", 
"start", "end"), row.names = c(NA, -3L))

Answer 4

Updated2

使用新版purrr（0.3.0）和dplyr（0.8.0），可以使用map2

完成此操作

library(dplyr)
library(purrr)
 test %>%
     # sequence of monthly dates for each corresponding start, end elements
     transmute(idnum, month = map2(start, end, seq, by = "1 month")) %>%
     # unnest the list column
     unnest %>% 
     # remove any duplicate rows
     distinct

更新

基于@Ananda Mahto的评论

 res1 <- melt(setNames(lapply(1:nrow(test), function(x) seq(test[x, "start"],
 test[x, "end"], by = "1 month")), test$idnum))

此外，

  res2 <- setNames(do.call(`rbind`,
          with(test, 
          Map(`expand.grid`,idnum,
          Map(`seq`, start, end, by='1 month')))), c("idnum", "month"))


  head(res1)
 #  idnum      month
 #1    17 1993-01-01
 #2    17 1993-02-01
 #3    17 1993-03-01
 #4    17 1993-04-01
 #5    17 1993-05-01
 #6    17 1993-06-01

Answer 5

考虑到您希望每个ID一个月的序列（在本例中为“ idnum”），使用此tidyverse的另一种tidyr::complete()可能性是：

df %>%
 gather(var, date, -idnum) %>%
 group_by(idnum) %>%
 distinct() %>%
 complete(date = seq.Date(min(date), max(date), by = "month")) %>%
 select(-var)

   date       idnum
   <date>     <int>
 1 1993-01-01    17
 2 1993-02-01    17
 3 1993-03-01    17
 4 1993-04-01    17
 5 1993-05-01    17
 6 1993-06-01    17
 7 1993-07-01    17
 8 1993-08-01    17
 9 1993-09-01    17
10 1993-10-01    17
11 1993-11-01    17
12 1993-12-01    17

它首先将数据从宽格式转换为长格式，但不包括变量“ idnum”。其次，它按“ idnum”对数据进行分组。第三，它删除每个“ idnum”的重复行。第三，通过在seq.Date()中使用tidyr::complete()，它会生成一个月序列（每个“ idnum”），从数据中的第一个月开始，到数据中的最后一个月结束。最后，它删除了多余的“ var”变量。

考虑到您希望每行一个月序列，可以将上面的代码修改为：

df %>%
 rowid_to_column() %>%
 gather(var, date, -c(idnum, rowid)) %>%
 group_by(rowid) %>%
 complete(date = seq.Date(min(date), max(date), by = "month")) %>%
 fill(idnum, .direction = "down") %>%
 select(-var)

   rowid date       idnum
   <int> <date>     <int>
 1     1 1993-01-01    17
 2     1 1993-02-01    17
 3     1 1993-03-01    17
 4     1 1993-04-01    17
 5     1 1993-05-01    17
 6     1 1993-06-01    17
 7     1 1993-07-01    17
 8     1 1993-08-01    17
 9     1 1993-09-01    17
10     1 1993-10-01    17
11     1 1993-11-01    17
12     1 1993-12-01    17
13     2 1993-01-01    17
14     2 1993-02-01    17
15     2 1993-03-01    17

在这种情况下，它首先生成一个唯一的行ID。其次，它将数据从宽格式转换为长格式，不包括变量“ idnum”和“ rowid”。第三，它按“行”对数据进行分组。第四，它为每个行ID生成月份序列。最后，它将缺失的值填充到“ idnum”中，并删除多余的“ var”变量。

使用开始日期和结束日期按日期范围展开行

5 个答案:

Updated2

更新