Question

我有一个包含多列的数据框，但特别是两列对我来说很有趣。 Column1包含值0和数字（＆gt; 0） Column2也包含数字。

我想在给定Column1的情况下创建包含Column2的新信息的21个新列。

因此，当Column1为正（非0）时，我希望第一个新列Column01从Column2中取回10的值。并且Column02返回9，.. Column11与Column2值完全相同..而Column21是10向前。

例如

  Column 1  Column2   Columns01 Columns02.. Columns11..Columns20 Columns21
      0        5          0         0           0          0         0
      0        2          0         0           0          0         0 
      0        0          0         0           0          0         0  
      1        3          0         0           3          5         4
      0        10         0         0           0          0         0
      0        83         0         0           0          0         0
      0        2          0         0           0          0         0
      0        5          0         0           0          0         0
      0        4          0         0           0          0         0
      1        8          0         5           8          5         3
      0        6          0         0           0          0         0
      0        5          0         0           0          0         0
      0        55         0         0           0          0         0
      0        4          0         0           0          0         0
      2        3          10       83           3          5         0
      0        2          0         0           0          0         0
      0        3          0         0           0          0         0
      0        4          0         0           0          0         0
      0        5          0         0           0          0         0
      0        3          0         0           0          0         0
      1        22         6         5          22          0         0
      0        12         0         0           0          0         0
      0        0          0         0           0          0         0
      0        5          0         0           0          0         0

希望这对你有意义，你可以提供帮助。

Answer 1

使用data.table v1.9.5中新实现的shift()函数的一种方式：

require(data.table) ## v1.9.5+
setDT(dat)                                                      ## (1)
cols = paste0("cols", sprintf("%.2d", 1:21))                    ## (2)
dat[, cols[1:10] := shift(Column2, 10:1, fill=0)]               ## (3)
dat[, cols[11] := Column2]                                      ## (4)
dat[, cols[12:21] := shift(Column2, 1:10, fill=0, type="lead")] ## (5)
dat[Column1 == 0, (cols) := 0]                                  ## (6)

假设dat是您的 data.frame ，setDT(dat)通过引用将其转换为 data.table （数据）为了提高效率，不会将其物理复制到内存中的新位置。
生成所有列名称。
生成Column2期间为10:1的滞后向量，并将其分配给前10列。
第11列是= Column2。
生成Column2期间1:10的前导向量，并将其分配给最后10列。
获取Column1 == 0所有行的索引，并将这些索引的所有新生成的列替换/重置为0。

如果您想要 data.frame ，请使用setDF(dat)。

您可以将其包含在值为-10:10的函数中，并相应地选择type="lag"或type="lead"，具体取决于值是负数还是正值。我将离开对你而言。

Answer 2

使用base R

的选项

cols = paste0("cols", sprintf("%.2d", 1:21)) #copied from @Arun's post
m1 <- matrix(c(rep(0,10), dat1[,2]), nrow=nrow(dat1)+10+1, ncol=21, 
              dimnames=list(NULL, cols))[1:nrow(dat1),]
dat2 <- cbind(dat1,m1*dat1[,1])

注意：在创建m1时，会出现警告。

使用@ Arun解决方案的输出进行检查（在＆＃39; dat＆＃39;上运行代码后）

library(data.table)
setDF(dat) #convert the 'data.table' to 'data.frame'
all.equal(dat2, dat, check.attributes=FALSE)
#[1] TRUE

数据

set.seed(24)
dat1 <- data.frame(Column1 = sample(0:1,10, replace=TRUE),
          Column2 = sample(1:5, 10, replace=TRUE))

dat <- copy(dat1)

如果那么在带有LAG的r中的数据帧上

2 个答案:

数据