如果另一列的值大于0,如何在一列中替换NA?

时间:2013-03-21 09:46:08

标签: r replace find criteria

我有8个人口普查线(L1:L8)。目前,有些记录在经过审核后有NA而不是0。当相应的工作列(NA)的值大于0时,我想在每个列(0)中将所有L1:L8替换为EFFORT_L1:EFFORT_L8(这意味着它们一直在审查)。

示例数据:

df <-structure(list(KARTA = c("02C2H", "02C2H", "02C2H", "02C2H", 
"02C2H", "02C2H"), YEAR = c(1997L, 1997L, 1997L, 1997L, 1997L, 
1997L), ART = c("009", "031", "012", "057", "065", "073"), L1 = c(NA, 
NA, NA, NA, 2, NA), L2 = c(NA, NA, 7, NA, 3, NA), L3 = c(NA, 
NA, NA, NA, 1, NA), L4 = c(NA, NA, NA, NA, 1, NA), L5 = c(NA, 
NA, NA, NA, 1, NA), L6 = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), L7 = c(NA, NA, NA, 1, NA, 1), L8 = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), EFFORT_L1 = c(10, 
10, 10, 10, 10, 10), EFFORT_L2 = c(10, 10, 10, 10, 10, 10), EFFORT_L3 = c(9.625, 
9.625, 9.625, 9.625, 9.625, 9.625), EFFORT_L4 = c(10, 10, 10, 
10, 10, 10), EFFORT_L5 = c(9.125, 9.125, 9.125, 9.125, 9.125, 
9.125), EFFORT_L6 = c(9.75, 9.75, 9.75, 9.75, 9.75, 9.75), EFFORT_L7 = c(9.75, 
9.75, 9.75, 9.75, 9.75, 9.75), EFFORT_L8 = c(10, 10, 10, 10, 
10, 10), Total_Route_Effort = c(78.25, 78.25, 78.25, 78.25, 78.25, 
78.25)), .Names = c("KARTA", "YEAR", "ART", "L1", "L2", "L3", 
"L4", "L5", "L6", "L7", "L8", "EFFORT_L1", "EFFORT_L2", "EFFORT_L3", 
"EFFORT_L4", "EFFORT_L5", "EFFORT_L6", "EFFORT_L7", "EFFORT_L8", 
"Total_Route_Effort"), row.names = c(NA, 6L), class = "data.frame")

单个列的示例代码(注意我正在寻找所有八列的有效解决方案):

df[is.na(df[,"L1"]) & df[,"EFFORT_L1"] > 0, "L1"] <- 0

3 个答案:

答案 0 :(得分:4)

df[paste0("L", 1:8)][is.na(df[paste0("L", 1:8)]) 
                     & df[paste0("EFFORT_L", 1:8)] > 0] <- 0

结果:

> df
  KARTA YEAR ART L1 L2 L3 L4 L5 L6 L7 L8 EFFORT_L1 EFFORT_L2
1 02C2H 1997 009  0  0  0  0  0  0  0  0        10        10
2 02C2H 1997 031  0  0  0  0  0  0  0  0        10        10
3 02C2H 1997 012  0  7  0  0  0  0  0  0        10        10
4 02C2H 1997 057  0  0  0  0  0  0  1  0        10        10
5 02C2H 1997 065  2  3  1  1  1  0  0  0        10        10
6 02C2H 1997 073  0  0  0  0  0  0  1  0        10        10
  EFFORT_L3 EFFORT_L4 EFFORT_L5 EFFORT_L6 EFFORT_L7 EFFORT_L8
1     9.625        10     9.125      9.75      9.75        10
2     9.625        10     9.125      9.75      9.75        10
3     9.625        10     9.125      9.75      9.75        10
4     9.625        10     9.125      9.75      9.75        10
5     9.625        10     9.125      9.75      9.75        10
6     9.625        10     9.125      9.75      9.75        10
  Total_Route_Effort
1              78.25
2              78.25
3              78.25
4              78.25
5              78.25
6              78.25

答案 1 :(得分:3)

这不是完全回答您的问题,但可能会帮助您解决未来的问题:从今天开始查看您的问题,您是否考虑过将数据转换为半长格式并使用它?

这是一个玩具示例:

样本数据

set.seed(1)
myDF <- data.frame(
  ID1 = sample(letters[1:5], 5, replace = TRUE),
  ID2 = 1:5, ID3 = "999",
  V1 = 99, V2 = 99, V3 = 99,
  EV1 = sample(0:5, 5, replace = TRUE),
  EV2 = sample(0:3, 5, replace = TRUE),
  EV3 = sample(0:2, 5, replace = TRUE),
  stringsAsFactors = FALSE
)
myDF$ID3[c(1, 4)] <- 100
myDF$V1[c(4, 5)] <- 100
myDF$V2[c(1, 3, 5)] <- 100
myDF
#   ID1 ID2 ID3  V1  V2 V3 EV1 EV2 EV3
# 1   b   1 100  99 100 99   5   0   1
# 2   b   2 999  99  99 99   5   0   2
# 3   c   3 999  99 100 99   3   2   2
# 4   e   4 100 100  99 99   3   1   1
# 5   b   5 999 100 100 99   0   3   2

半长(或半宽,取决于您的观点)格式的数据

myDFLong <- reshape(myDF, direction = "long", idvar = 1:3,
                    varying = 4:ncol(myDF), sep = "")
myDFLong
#           ID1 ID2 ID3 time   V EV
# b.1.100.1   b   1 100    1  99  5
# b.2.999.1   b   2 999    1  99  5
# c.3.999.1   c   3 999    1  99  3
# e.4.100.1   e   4 100    1 100  3
# b.5.999.1   b   5 999    1 100  0
# b.1.100.2   b   1 100    2 100  0
# b.2.999.2   b   2 999    2  99  0
# c.3.999.2   c   3 999    2 100  2
# e.4.100.2   e   4 100    2  99  1
# b.5.999.2   b   5 999    2 100  3
# b.1.100.3   b   1 100    3  99  1
# b.2.999.3   b   2 999    3  99  2
# c.3.999.3   c   3 999    3  99  2
# e.4.100.3   e   4 100    3  99  1
# b.5.999.3   b   5 999    3  99  2

请注意,我们现在只有一列用于等效“L”列,一列用于等效“EFFORT_L”列。已创建“时间”变量(相当于您的1-8“人口普查线”)。

从今天开始回答你的问题

通过一些简单的ifelse陈述,您可以轻松解决迄今为止的所有问题。

# Your first question from today
myDFLong$V <- with(myDFLong, ifelse(ID3 == 999 & V == 99, NA, V))
# Continuation from that point
myDFLong$V <- with(myDFLong, ifelse(EV > 0 & is.na(V), 0, V))
myDFLong
#           ID1 ID2 ID3 time   V EV
# b.1.100.1   b   1 100    1  99  5
# b.2.999.1   b   2 999    1   0  5
# c.3.999.1   c   3 999    1   0  3
# e.4.100.1   e   4 100    1 100  3
# b.5.999.1   b   5 999    1 100  0
# b.1.100.2   b   1 100    2 100  0
# b.2.999.2   b   2 999    2  NA  0
# c.3.999.2   c   3 999    2 100  2
# e.4.100.2   e   4 100    2  99  1
# b.5.999.2   b   5 999    2 100  3
# b.1.100.3   b   1 100    3  99  1
# b.2.999.3   b   2 999    3   0  2
# c.3.999.3   c   3 999    3   0  2
# e.4.100.3   e   4 100    3  99  1
# b.5.999.3   b   5 999    3   0  2

最后阶段:如果您愿意,可以回到更广泛的格式

您可以使用基本R重新转换为宽格式,但在这种情况下,使用“reshape2”包会更容易,如下所示:

library(reshape2)
myDF2 <- melt(myDFLong, id.vars=1:4)
myDFFinal <- dcast(myDF2, ID1 + ID2 + ID3 ~ variable + time)
myDFFinal
#   ID1 ID2 ID3 V_1 V_2 V_3 EV_1 EV_2 EV_3
# 1   b   1 100  99 100  99    5    0    1
# 2   b   2 999   0  NA   0    5    0    2
# 3   b   5 999 100 100   0    0    3    2
# 4   c   3 999   0 100   0    3    2    2
# 5   e   4 100 100  99  99    3    1    1

但是,我建议你只在最后才做到这一点 - 很多事情,如绘制函数等等,数据开始时采用长或半长格式,所以它可能值得你的时间考虑你的数据。

请注意,因为您的数据当前已命名行 - 您需要将这些行作为列添加到数据中,以充分利用它们作为其他ID变量。

答案 2 :(得分:1)

如果“L”变量的数量未修复,您可以使用:

l.vars <- grep("^L\\d$", names(df),value=TRUE)
for (v in l.vars) {
  effort.var <- paste0("EFFORT_", v)
  df[is.na(df[,v]) & df[,effort.var] > 0, v] <- 0
}