我想将零替换为特定月份之后的月度值。我尝试过调整Replace NA values in dataframe starting in varying columns但没有成功。鉴于数据:
df <- structure(list(Mth1 = c(1L, 3L, 4L, 1L, 2L),
Mth2 = c(2L, 3L, 2L, 2L, 2L),
Mth3 = c(1L, 2L, 1L, 2L, 3L),
Mth4 = c(3L, 1L, 3L, 4L, 2L),
ZeroMth = c(1L, 3L, 2L, 4L, 3L)),
.Names = c("Mth1", "Mth2", "Mth3", "Mth4", "ZeroMth"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5"))
> df
Mth1 Mth2 Mth3 Mth4 ZeroMth
1 1 2 1 3 1
2 3 3 2 1 3
3 4 2 1 3 2
4 1 2 2 4 4
5 2 2 3 2 3
我想使用ZeroMth列中的值来指定替换开始的月份。所需的输出是:
> df1
Mth1 Mth2 Mth3 Mth4
1 0 0 0 0
2 3 3 0 0
3 4 0 0 0
4 1 2 2 0
5 2 2 0 0
答案 0 :(得分:2)
在每一行(with open('new_dates2.csv', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerows(incoming)
)和apply
上使用MARGIN = 1
,将最后一列中指定的索引后的值设为零
replace
答案 1 :(得分:2)
你也可以像这样使用lapply
setNames(data.frame(lapply(head(seq_along(df), -1), function(i) df[, i] * (i < df$ZeroMth))),
head(names(df), -1))
返回
Mth1 Mth2 Mth3 Mth4
1 0 0 0 0
2 3 3 0 0
3 4 0 0 0
4 1 2 2 0
5 2 2 0 0
在这里,您将浏览月份向量的位置,并检查月份中的元素是否小于指定的零月份。如果是,则返回该值,否则为0. setNames
用于恢复变量名称。
经过测试,将lapply
更改为sapply
会导致超过2倍的加速。主要的减速是由于转换为data.frame。
这让我进一步检查。以下是microbenchmark结果。
microbenchmark(
db.mat=t(apply(X = df, MARGIN = 1, function(x)
replace(x = x, list = x[NCOL(df)]:(NCOL(df)-1), values = 0))),
db.df=data.frame(t(apply(X = df, MARGIN = 1, function(x)
replace(x = x, list = x[NCOL(df)]:(NCOL(df)-1), values = 0)))),
lmo.list=setNames(lapply(head(seq_along(df), -1),
function(i) df[, i] * (i < df$ZeroMth)),
head(names(df), -1)),
lmo.dfl=setNames(data.frame(lapply(head(seq_along(df), -1),
function(i) df[, i] * (i < df$ZeroMth))),
head(names(df), -1)),
lmo.dfs=setNames(data.frame(sapply(head(seq_along(df), -1),
function(i) df[, i] * (i < df$ZeroMth))),
head(names(df), -1)),
lmo.listAlt=setNames(lapply(head(seq_along(df), -1),
function(i) {temp <- df[, i]; temp[i < df$ZeroMth] <- 0; temp}),
head(names(df), -1)),
lmo.dflAlt=setNames(data.frame(lapply(head(seq_along(df), -1),
function(i) {temp <- df[, i]; temp[i < df$ZeroMth] <- 0; temp})),
head(names(df), -1)),
lmo.dfsAlt=setNames(data.frame(sapply(head(seq_along(df), -1),
function(i) {temp <- df[, i]; temp[i < df$ZeroMth] <- 0; temp})),
head(names(df), -1)))
Unit: microseconds
expr min lq mean median uq max neval cld
df.mat 135.994 155.2380 161.2480 159.6570 166.785 196.436 100 b
db.df 225.231 236.9190 248.3295 246.0430 256.164 340.411 100 c
lmo.list 84.960 99.5005 105.8299 104.9175 110.905 156.806 100 a
lmo.dfl 439.057 459.1565 480.3425 476.5475 492.656 647.751 100 d
lmo.dfs 173.057 187.3120 217.2876 195.8650 202.850 2257.151 100 c
lmo.listAlt 91.803 108.0535 114.6253 113.1860 118.602 185.602 100 ab
lmo.dflAlt 458.158 481.2520 521.6052 498.2155 516.462 2584.163 100 d
lmo.dfsAlt 181.610 198.4310 221.5613 204.2755 212.686 1611.395 100 c
哇,lapply
data.frame
超级慢。
答案 2 :(得分:0)
我们也可以通过
制作这个契约(col(df[-5]) <df$ZeroMth[row(df[-5])])*df[-5]
# Mth1 Mth2 Mth3 Mth4
#1 0 0 0 0
#2 3 3 0 0
#3 4 0 0 0
#4 1 2 2 0
#5 2 2 0 0