Question

这是我的示例data.frame：

df = read.table(text = 'ID Day Count Count_group
                18  1933    6   15
                33  1933    6   15
                37  1933    6   15
                18  1933    6   15
                16  1933    6   15
                11  1933    6   15
                111 1932    5   9
                34  1932    5   9
                60  1932    5   9
                88  1932    5   9
                18  1932    5   9
                33  1931    3   4
                13  1931    3   4
                56  1931    3   4
                23  1930    1   1
                6   1800    6   12
                37  1800    6   12
                98  1800    6   12
                52  1800    6   12
                18  1800    6   12
                76  1800    6   12
                55  1799    4   6
                6   1799    4   6
                52  1799    4   6
                133 1799    4   6
                112 1798    2   2
                677 1798    2   2
                778 888     4   8
                111 888     4   8
                88  888     4   8
                10  888     4   8
                37  887     2   4
                26  887     2   4
                8   886     1   2
                56  885     1   1
                22  120     2   6
                34  120     2   6
                88  119     1   6
                99  118     2   5
                12  118     2   5
                90  117     1   3
                22  115     2   2
                99  115     2   2', header = TRUE)

Count列显示ID内Day次观察的数量; Count_group显示ID及其前4天内Day次观察的数量。

我需要展开{{1}}才能在每个df集中播放所有日子。

预期产出：

Count_group

输出说明：

1）1933年在这个确切的日子（Count col）有6个ID，从1933年到1929年（Count_group col）总共有15个ID。值15来自6（1933年）+ 5（1932）+ 3（1931）+ 1（1930）+ 0（1929）。所以在输出中我添加了Count_group = 15集中的所有剩余天数。

2）下一天的降序是1932年。在这个确切的日子有5个ID，从1932年到1928年共有9个ID。值9来自5（1932）+ 3（1931）+ 1（1930年））+ 0（1929）+ 0（1928）。在输出（第28行）中，您将看到1932年的完整（5天）剧集，总共有9行。

3）次日是1931年......等等。

输出data.frame按Count_group和Day排名，均为减去= TRUE。

我正在尝试创建一个代码，该代码不仅适用于5天的窗口（如上所述），而且适用于n天的任何时间窗口。

你有什么建议吗？

由于

Answer 1

试试这个并告诉我这是不是你在想：

# First I split the dataframe by each day using split()
duplicates <- lapply(split(df, df$Day), function(x){
  if(nrow(x) != x[1,"Count_group"]) { # check if # of rows != the number you want
    x[rep(1:nrow(x), length.out = x[1,"Count_group"]),] # repeat them until you get it
  } else {
    x
  }
})

df2 <- do.call("rbind.data.frame", duplicates) # turn the list back into a dataframe
df3 <- df2[order(df2[,"Count_group"], df2[,"Day"], decreasing = T), ] # orderby Day & count
rownames(df3) <- NULL # names back to 1:X instead of the generated ones
df3 # the result

通过填充data.frame

1 个答案: