Question

我的R脚本有几个问题。我有一个包含许多系列的数据库，它们有NA和数值。我想从我们有数字值的那一刻起将NA替换为0，但是如果系列没有启动则保持NA。

正如我们在下面看到的那样，例如在第二列中，我想保留2个第一个NA，但将第四个替换为0.

example

有我的剧本，但它不起作用

my actual script

提出一些建议非常友好

非常感谢

ER

Answer 1

如果您或其他任何人想要避免循环：

# example dataset
df = data.frame(x1 = c(23,NA,NA,35),
                x2 = c(NA,NA,45,NA),
                x3 = c(4,34,NA,5))

# function to replace NAs not in the beginning of vector with 0
f = function(x) { x[is.na(x) & cumsum(!is.na(x)) != 0] = 0; x }

# apply function and save as dataframe
data.frame(sapply(df, f))

#   x1 x2 x3
# 1 23 NA  4
# 2  0 NA 34
# 3  0 45  0
# 4 35  0  5

或使用tidyverse和相同的功能f：

library(tidyverse)

df %>% map_df(f)

# # A tibble: 4 x 3
#     x1    x2    x3
#   <dbl> <dbl> <dbl>
# 1   23.   NA     4.
# 2    0.   NA    34.
# 3    0.   45.    0.
# 4   35.    0.    5.

Answer 2

如果这是您的数据集：

ORIGINAL_DATA <- data.frame(X1 = c(23, NA, NA, 35), 
                            X2 = c(NA, NA, 45, NA), 
                            X3 = c(4, 34, NA, 5))

这可能有用：

for(i in 1:ncol(ORIGINAL_DATA)) {
  for (j in 1:nrow(ORIGINAL_DATA)) {
    if(!is.na(ORIGINAL_DATA[j, i])) {
      ORIGINAL_DATA[c(j:nrow(ORIGINAL_DATA)), i] <- ifelse(is.na(ORIGINAL_DATA[c(j:nrow(ORIGINAL_DATA)), i]), 0, ORIGINAL_DATA[c(j:nrow(ORIGINAL_DATA)), i])

      # To end this for-loop
      j <- nrow(ORIGINAL_DATA)
    }
  }
}

在R

2 个答案: