一些基准

Question

我想将零替换为特定月份之后的月度值。我尝试过调整Replace NA values in dataframe starting in varying columns但没有成功。鉴于数据：

df <- structure(list(Mth1 = c(1L, 3L, 4L, 1L, 2L), 
                      Mth2 = c(2L, 3L, 2L, 2L, 2L),
                      Mth3 = c(1L, 2L, 1L, 2L, 3L), 
                      Mth4 = c(3L, 1L, 3L, 4L, 2L),
                      ZeroMth = c(1L, 3L, 2L, 4L, 3L)),
                 .Names = c("Mth1", "Mth2", "Mth3", "Mth4", "ZeroMth"), class = "data.frame", 
                 row.names = c("1", "2", "3", "4", "5"))


> df
  Mth1 Mth2 Mth3 Mth4 ZeroMth
1    1    2    1    3       1
2    3    3    2    1       3
3    4    2    1    3       2
4    1    2    2    4       4
5    2    2    3    2       3

我想使用ZeroMth列中的值来指定替换开始的月份。所需的输出是：

> df1
  Mth1 Mth2 Mth3 Mth4
1    0    0    0    0
2    3    3    0    0 
3    4    0    0    0
4    1    2    2    0
5    2    2    0    0

Answer 1

在每一行（with open('new_dates2.csv', 'w') as out_file: writer = csv.writer(out_file) writer.writerows(incoming)）和apply上使用MARGIN = 1，将最后一列中指定的索引后的值设为零

replace

Answer 2

你也可以像这样使用lapply

setNames(data.frame(lapply(head(seq_along(df), -1), function(i) df[, i] * (i < df$ZeroMth))),
         head(names(df), -1))

返回

  Mth1 Mth2 Mth3 Mth4
1    0    0    0    0
2    3    3    0    0
3    4    0    0    0
4    1    2    2    0
5    2    2    0    0

在这里，您将浏览月份向量的位置，并检查月份中的元素是否小于指定的零月份。如果是，则返回该值，否则为0. setNames用于恢复变量名称。

一些基准

经过测试，将lapply更改为sapply会导致超过2倍的加速。主要的减速是由于转换为data.frame。

这让我进一步检查。以下是microbenchmark结果。

microbenchmark(
db.mat=t(apply(X = df, MARGIN = 1, function(x)
         replace(x = x, list = x[NCOL(df)]:(NCOL(df)-1), values = 0))),
db.df=data.frame(t(apply(X = df, MARGIN = 1, function(x)
         replace(x = x, list = x[NCOL(df)]:(NCOL(df)-1), values = 0)))),
lmo.list=setNames(lapply(head(seq_along(df), -1),
                    function(i) df[, i] * (i < df$ZeroMth)),
                    head(names(df), -1)),
lmo.dfl=setNames(data.frame(lapply(head(seq_along(df), -1),
                         function(i) df[, i] * (i < df$ZeroMth))),
                 head(names(df), -1)),
lmo.dfs=setNames(data.frame(sapply(head(seq_along(df), -1),
                           function(i) df[, i] * (i < df$ZeroMth))),
                 head(names(df), -1)),
lmo.listAlt=setNames(lapply(head(seq_along(df), -1),
                    function(i) {temp <- df[, i]; temp[i < df$ZeroMth] <- 0; temp}),
                    head(names(df), -1)),
lmo.dflAlt=setNames(data.frame(lapply(head(seq_along(df), -1),
                         function(i) {temp <- df[, i]; temp[i < df$ZeroMth] <- 0; temp})),
                 head(names(df), -1)),
lmo.dfsAlt=setNames(data.frame(sapply(head(seq_along(df), -1),
                           function(i) {temp <- df[, i]; temp[i < df$ZeroMth] <- 0; temp})),
                 head(names(df), -1)))

Unit: microseconds
        expr     min       lq     mean   median      uq      max neval  cld
      df.mat 135.994 155.2380 161.2480 159.6570 166.785  196.436   100  b  
       db.df 225.231 236.9190 248.3295 246.0430 256.164  340.411   100   c 
    lmo.list  84.960  99.5005 105.8299 104.9175 110.905  156.806   100 a   
     lmo.dfl 439.057 459.1565 480.3425 476.5475 492.656  647.751   100    d
     lmo.dfs 173.057 187.3120 217.2876 195.8650 202.850 2257.151   100   c 
 lmo.listAlt  91.803 108.0535 114.6253 113.1860 118.602  185.602   100 ab  
  lmo.dflAlt 458.158 481.2520 521.6052 498.2155 516.462 2584.163   100    d
  lmo.dfsAlt 181.610 198.4310 221.5613 204.2755 212.686 1611.395   100   c

哇，lapply data.frame超级慢。

Answer 3

我们也可以通过

制作这个契约

(col(df[-5]) <df$ZeroMth[row(df[-5])])*df[-5]
#    Mth1 Mth2 Mth3 Mth4
#1    0    0    0    0
#2    3    3    0    0
#3    4    0    0    0
#4    1    2    2    0
#5    2    2    0    0

R - 替换从选定列开始的值

3 个答案:

一些基准