我已经按列将数据读入数据帧R.有些列的价值会增加;对于那些列,我想要将每个值(n)替换为与该列中先前值的差异。例如,查看单个列,我想要
c(1,2,5,7,8)
替换为
c(1,3,2,1)
是连续元素之间的差异
然而,现在已经很晚了,我认为我的大脑刚刚停止工作。这是我目前的代码
col1 <- c(1,2,3,4,NA,2,3,1) # This column rises and falls, so we want to ignore it
col2 <- c(1,2,3,5,NA,5,6,7) # Note: this column always rises in value, so we want to replace it with deltas
col3 <- c(5,4,6,7,NA,9,3,5) # This column rises and falls, so we want to ignore it
d <- cbind(col1, col2, col3)
d
fix_data <- function(data) {
# Iterate through each column...
for (column in data[,1:dim(data)[2]]) {
lastvalue <- 0
# Now walk through each value in the column,
# checking to see if the column consistently rises in value
for (value in column) {
if (is.na(value) == FALSE) { # Need to ignore NAs
if (value >= lastvalue) {
alwaysIncrementing <- TRUE
} else {
alwaysIncrementing <- FALSE
break
}
}
}
if (alwaysIncrementing) {
print(paste("Column", column, "always increments"))
}
# If a column is always incrementing, alwaysIncrementing will now be TRUE
# In this case, I want to replace each element in the column with the delta between successive
# elements. The size of the column shrinks by 1 in doing this, so just prepend a copy of
# the 1st element to the start of the list to ensure the column length remains the same
if (alwaysIncrementing) {
print(paste("This is an incrementing column:", colnames(column)))
column <- c(column[1], diff(column, lag=1))
}
}
data
}
fix_data(d)
d
如果您将此代码复制/粘贴到RGui中,您将看到它对提供的数据框没有任何作用。
除了失去理智,我做错了什么?
提前致谢
答案 0 :(得分:3)
在不详细解释代码的情况下,您将值分配给column
,这是循环中的局部变量(即column
和data
之间没有任何关系那个背景)。您需要将这些值分配给data
中的适当值。
此外,data
将是您的函数的本地,因此您需要在运行该函数后将其分配回data
。
顺便提一下,您可以使用diff
查看是否有任何值正在递增而不是循环遍历每个值:
idx <- apply(d, 2, function(x) !any(diff(x[!is.na(x)]) < 0))
d[,idx] <- blah
答案 1 :(得分:2)
diff
计算向量中连续值之间的差异。您可以使用例如
dfr <- data.frame(x = c(1,2,5,7,8), y = (1:5)^2)
as.data.frame(lapply(dfr, diff))
x y
1 1 3
2 3 5
3 2 7
4 1 9
编辑:我刚才注意到了一些事情。您正在使用矩阵,而不是数据框(正如您在问题中所述)。对于矩阵'd',您可以使用
d_diff <- apply(d, 2, diff)
#Find columns that are (strictly) increasing
incr <- apply(d_diff, 2, function(x) all(x > 0, na.rm=TRUE))
#Replace values in the approriate columns
d[2:nrow(d),incr] <- d_diff[,incr]