Question

我已经按列将数据读入数据帧R.有些列的价值会增加;对于那些列，我想要将每个值（n）替换为与该列中先前值的差异。例如，查看单个列，我想要

c(1,2,5,7,8)

替换为

c(1,3,2,1)

是连续元素之间的差异

然而，现在已经很晚了，我认为我的大脑刚刚停止工作。这是我目前的代码

col1 <- c(1,2,3,4,NA,2,3,1) # This column rises and falls, so we want to ignore it
col2 <- c(1,2,3,5,NA,5,6,7) # Note: this column always rises in value, so we want to replace it with deltas
col3 <- c(5,4,6,7,NA,9,3,5) # This column rises and falls, so we want to ignore it
d <- cbind(col1, col2, col3)
d
fix_data <- function(data) {
    # Iterate through each column...
    for (column in data[,1:dim(data)[2]]) {
        lastvalue <- 0
        # Now walk through each value in the column, 
        # checking to see if the column consistently rises in value
        for (value in column) {
            if (is.na(value) == FALSE) { # Need to ignore NAs
                if (value >= lastvalue) {
                    alwaysIncrementing <- TRUE
                } else {
                    alwaysIncrementing <- FALSE
                    break
                }
            }
        }

        if (alwaysIncrementing) {
            print(paste("Column", column, "always increments"))
        }

        # If a column is always incrementing, alwaysIncrementing will now be TRUE
        # In this case, I want to replace each element in the column with the delta between successive
        # elements.  The size of the column shrinks by 1 in doing this, so just prepend a copy of
        # the 1st element to the start of the list to ensure the column length remains the same
        if (alwaysIncrementing) {
            print(paste("This is an incrementing column:", colnames(column)))
            column <- c(column[1], diff(column, lag=1))
        }
    }
    data
}

fix_data(d)
d

如果您将此代码复制/粘贴到RGui中，您将看到它对提供的数据框没有任何作用。

除了失去理智，我做错了什么？

提前致谢

Answer 1

在不详细解释代码的情况下，您将值分配给column，这是循环中的局部变量（即column和data之间没有任何关系那个背景）。您需要将这些值分配给data中的适当值。

此外，data将是您的函数的本地，因此您需要在运行该函数后将其分配回data。

顺便提一下，您可以使用diff查看是否有任何值正在递增而不是循环遍历每个值：

idx <- apply(d, 2, function(x) !any(diff(x[!is.na(x)]) < 0))
d[,idx] <- blah

Answer 2

diff计算向量中连续值之间的差异。您可以使用例如

将其应用于数据框中的每一列

dfr <- data.frame(x = c(1,2,5,7,8), y = (1:5)^2)
as.data.frame(lapply(dfr, diff))

  x y
1 1 3
2 3 5
3 2 7
4 1 9

编辑：我刚才注意到了一些事情。您正在使用矩阵，而不是数据框（正如您在问题中所述）。对于矩阵'd'，您可以使用

d_diff <- apply(d, 2, diff)
#Find columns that are (strictly) increasing
incr <- apply(d_diff, 2, function(x) all(x > 0, na.rm=TRUE))
#Replace values in the approriate columns
d[2:nrow(d),incr] <- d_diff[,incr]

选择性地将R中的列替换为delta值

2 个答案: