Question

我正在运行一个简单的ifelse函数

f <- function(x) {
ifelse(x==shift(x), x + 0.001* sd(x, na.rm = TRUE), x)
}

其中shift来自data.table包

允许我为数据框中的每一列（usig apply ）更改一个与前一个列完全相同的值。问题是ifelse函数返回的长度等于测试的长度。在这种情况下，长度是 shift（x）而不是 x 之一。因此，我最终得到了第一个元素（或者最后一个，如果使用type =＆＃34; lead＆＃34;，而不是默认的＆＃34; lag＆＃34;），每个列变成了NA。

这是一个MWE：

a <- c(1,2,2,3,4,5,6)
b <- c(4,5,6,7,8,8,9)
data <- data.frame(cbind(a,b))
f <- function(x) {
ifelse(x==shift(x), x + 0.001* sd(x, na.rm = TRUE), x)
}
apply(data, 2, f)

因此我认为我可以改变 ifelse 功能：我已经做了一些尝试来改变 length.out 但我还没有成功

function (test, yes, no) 
{
if (is.atomic(test)) {
    if (typeof(test) != "logical") 
        storage.mode(test) <- "logical"
    if (length(test) == 1 && is.null(attributes(test))) {
        if (is.na(test)) 
            return(NA)
        else if (test) {
            if (length(yes) == 1 && is.null(attributes(yes))) 
              return(yes)
        }
        else if (length(no) == 1 && is.null(attributes(no))) 
            return(no)
    }
}
else test <- if (isS4(test)) 
    methods::as(test, "logical")
else as.logical(test)
ans <- test
ok <- !(nas <- is.na(test))
if (any(test[ok])) 
    ans[test & ok] <- rep(yes, length.out = length(ans))[test & 
        ok]
if (any(!test[ok])) 
    ans[!test & ok] <- rep(no, length.out = length(ans))[!test & 
        ok]
ans[nas] <- NA
ans
}

修改

我原来的代码是：

copy <- copy(data)
for (j in 1: ncol(copy)) {
    for (i in 2: nrow(copy)) {
        if (copy[i,j] == copy[i-1,j] & !is.na(copy[i,j]) & !is.na(copy[i-1,j]))  {
            copy[i,j] <- copy[i-1,j] + (0.0001*sd(copy[,j], na.rm = T))
        }
    }
}

但使用大型矩阵可能会导致运行时间变慢。这涉及多次重复。目标是使用函数和应用来获得矢量化，更快的方法。

Answer 1

正如您所提到的，您的方法会在NA返回的向量的第一个元素中导致f。第一个元素与前一个元素不相似（因为没有），所以我们希望第一个值保持不变。

一种直截了当的方法就是做到这一点。道歉，虽然它确实解决了你的问题但它没有回答你的标题问题。

f <- function(x) { 
    # storing the output of ifelse in a variable
    out <- ifelse(x==shift(x), x + 0.001* sd(x, na.rm = TRUE), x)
    # changing the first element of `out` into first element of x 
    out[1] <- x[1]
    # returning `out` -- in a R function, 
    # the last thing evaluated is returned
    out 
}

请注意，重复两次以上的元素（例如c(1,2,2,2,3)）不能正确处理。此外，这将以相同的方式更改所有元素。所以在c(1,2,2,1,2,2)中，所有第二个二十一将以相同的方式改变。这可能或者不是你想要的东西。

你可能会破解某些内容（评论建议?rle），但我建议您更改数据随机化的方式，如果这对您的特定数据有意义的话。

可能您可以使用此标准开发添加高斯噪声，而不是添加0.001*sd？这显然取决于您的应用程序。

f <- function(x) { 
    # adding gaussian noise with small sd to repeated values
    # storing the output in a variable `out`
    out <- ifelse(x==shift(x), 
                  x + rnorm(length(x), mean=0, 
                            sd=0.01*sd(x, na.rm = TRUE)),
                  x)
    # changing the first element of `out` into first element of x 
    out[1] <- x[1]
    # returning `out` -- in a R function, 
    # the last thing evaluated is returned
    out 
}

这取决于你摆脱确切重复值的目的是什么。

在ifelse函数中更改length.out

1 个答案: