我有8个人口普查线(L1:L8
)。目前,有些记录在经过审核后有NA
而不是0
。当相应的工作列(NA
)的值大于0时,我想在每个列(0
)中将所有L1:L8
替换为EFFORT_L1:EFFORT_L8
(这意味着它们一直在审查)。
示例数据:
df <-structure(list(KARTA = c("02C2H", "02C2H", "02C2H", "02C2H",
"02C2H", "02C2H"), YEAR = c(1997L, 1997L, 1997L, 1997L, 1997L,
1997L), ART = c("009", "031", "012", "057", "065", "073"), L1 = c(NA,
NA, NA, NA, 2, NA), L2 = c(NA, NA, 7, NA, 3, NA), L3 = c(NA,
NA, NA, NA, 1, NA), L4 = c(NA, NA, NA, NA, 1, NA), L5 = c(NA,
NA, NA, NA, 1, NA), L6 = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), L7 = c(NA, NA, NA, 1, NA, 1), L8 = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), EFFORT_L1 = c(10,
10, 10, 10, 10, 10), EFFORT_L2 = c(10, 10, 10, 10, 10, 10), EFFORT_L3 = c(9.625,
9.625, 9.625, 9.625, 9.625, 9.625), EFFORT_L4 = c(10, 10, 10,
10, 10, 10), EFFORT_L5 = c(9.125, 9.125, 9.125, 9.125, 9.125,
9.125), EFFORT_L6 = c(9.75, 9.75, 9.75, 9.75, 9.75, 9.75), EFFORT_L7 = c(9.75,
9.75, 9.75, 9.75, 9.75, 9.75), EFFORT_L8 = c(10, 10, 10, 10,
10, 10), Total_Route_Effort = c(78.25, 78.25, 78.25, 78.25, 78.25,
78.25)), .Names = c("KARTA", "YEAR", "ART", "L1", "L2", "L3",
"L4", "L5", "L6", "L7", "L8", "EFFORT_L1", "EFFORT_L2", "EFFORT_L3",
"EFFORT_L4", "EFFORT_L5", "EFFORT_L6", "EFFORT_L7", "EFFORT_L8",
"Total_Route_Effort"), row.names = c(NA, 6L), class = "data.frame")
单个列的示例代码(注意我正在寻找所有八列的有效解决方案):
df[is.na(df[,"L1"]) & df[,"EFFORT_L1"] > 0, "L1"] <- 0
答案 0 :(得分:4)
df[paste0("L", 1:8)][is.na(df[paste0("L", 1:8)])
& df[paste0("EFFORT_L", 1:8)] > 0] <- 0
结果:
> df
KARTA YEAR ART L1 L2 L3 L4 L5 L6 L7 L8 EFFORT_L1 EFFORT_L2
1 02C2H 1997 009 0 0 0 0 0 0 0 0 10 10
2 02C2H 1997 031 0 0 0 0 0 0 0 0 10 10
3 02C2H 1997 012 0 7 0 0 0 0 0 0 10 10
4 02C2H 1997 057 0 0 0 0 0 0 1 0 10 10
5 02C2H 1997 065 2 3 1 1 1 0 0 0 10 10
6 02C2H 1997 073 0 0 0 0 0 0 1 0 10 10
EFFORT_L3 EFFORT_L4 EFFORT_L5 EFFORT_L6 EFFORT_L7 EFFORT_L8
1 9.625 10 9.125 9.75 9.75 10
2 9.625 10 9.125 9.75 9.75 10
3 9.625 10 9.125 9.75 9.75 10
4 9.625 10 9.125 9.75 9.75 10
5 9.625 10 9.125 9.75 9.75 10
6 9.625 10 9.125 9.75 9.75 10
Total_Route_Effort
1 78.25
2 78.25
3 78.25
4 78.25
5 78.25
6 78.25
答案 1 :(得分:3)
这不是完全回答您的问题,但可能会帮助您解决未来的问题:从今天开始查看您的问题,您是否考虑过将数据转换为半长格式并使用它?
这是一个玩具示例:
set.seed(1)
myDF <- data.frame(
ID1 = sample(letters[1:5], 5, replace = TRUE),
ID2 = 1:5, ID3 = "999",
V1 = 99, V2 = 99, V3 = 99,
EV1 = sample(0:5, 5, replace = TRUE),
EV2 = sample(0:3, 5, replace = TRUE),
EV3 = sample(0:2, 5, replace = TRUE),
stringsAsFactors = FALSE
)
myDF$ID3[c(1, 4)] <- 100
myDF$V1[c(4, 5)] <- 100
myDF$V2[c(1, 3, 5)] <- 100
myDF
# ID1 ID2 ID3 V1 V2 V3 EV1 EV2 EV3
# 1 b 1 100 99 100 99 5 0 1
# 2 b 2 999 99 99 99 5 0 2
# 3 c 3 999 99 100 99 3 2 2
# 4 e 4 100 100 99 99 3 1 1
# 5 b 5 999 100 100 99 0 3 2
myDFLong <- reshape(myDF, direction = "long", idvar = 1:3,
varying = 4:ncol(myDF), sep = "")
myDFLong
# ID1 ID2 ID3 time V EV
# b.1.100.1 b 1 100 1 99 5
# b.2.999.1 b 2 999 1 99 5
# c.3.999.1 c 3 999 1 99 3
# e.4.100.1 e 4 100 1 100 3
# b.5.999.1 b 5 999 1 100 0
# b.1.100.2 b 1 100 2 100 0
# b.2.999.2 b 2 999 2 99 0
# c.3.999.2 c 3 999 2 100 2
# e.4.100.2 e 4 100 2 99 1
# b.5.999.2 b 5 999 2 100 3
# b.1.100.3 b 1 100 3 99 1
# b.2.999.3 b 2 999 3 99 2
# c.3.999.3 c 3 999 3 99 2
# e.4.100.3 e 4 100 3 99 1
# b.5.999.3 b 5 999 3 99 2
请注意,我们现在只有一列用于等效“L”列,一列用于等效“EFFORT_L”列。已创建“时间”变量(相当于您的1-8“人口普查线”)。
通过一些简单的ifelse
陈述,您可以轻松解决迄今为止的所有问题。
# Your first question from today
myDFLong$V <- with(myDFLong, ifelse(ID3 == 999 & V == 99, NA, V))
# Continuation from that point
myDFLong$V <- with(myDFLong, ifelse(EV > 0 & is.na(V), 0, V))
myDFLong
# ID1 ID2 ID3 time V EV
# b.1.100.1 b 1 100 1 99 5
# b.2.999.1 b 2 999 1 0 5
# c.3.999.1 c 3 999 1 0 3
# e.4.100.1 e 4 100 1 100 3
# b.5.999.1 b 5 999 1 100 0
# b.1.100.2 b 1 100 2 100 0
# b.2.999.2 b 2 999 2 NA 0
# c.3.999.2 c 3 999 2 100 2
# e.4.100.2 e 4 100 2 99 1
# b.5.999.2 b 5 999 2 100 3
# b.1.100.3 b 1 100 3 99 1
# b.2.999.3 b 2 999 3 0 2
# c.3.999.3 c 3 999 3 0 2
# e.4.100.3 e 4 100 3 99 1
# b.5.999.3 b 5 999 3 0 2
您可以使用基本R重新转换为宽格式,但在这种情况下,使用“reshape2”包会更容易,如下所示:
library(reshape2)
myDF2 <- melt(myDFLong, id.vars=1:4)
myDFFinal <- dcast(myDF2, ID1 + ID2 + ID3 ~ variable + time)
myDFFinal
# ID1 ID2 ID3 V_1 V_2 V_3 EV_1 EV_2 EV_3
# 1 b 1 100 99 100 99 5 0 1
# 2 b 2 999 0 NA 0 5 0 2
# 3 b 5 999 100 100 0 0 3 2
# 4 c 3 999 0 100 0 3 2 2
# 5 e 4 100 100 99 99 3 1 1
但是,我建议你只在最后才做到这一点 - 很多事情,如绘制函数等等,数据开始时采用长或半长格式,所以它可能值得你的时间考虑你的数据。
请注意,因为您的数据当前已命名行 - 您需要将这些行作为列添加到数据中,以充分利用它们作为其他ID变量。
答案 2 :(得分:1)
如果“L”变量的数量未修复,您可以使用:
l.vars <- grep("^L\\d$", names(df),value=TRUE)
for (v in l.vars) {
effort.var <- paste0("EFFORT_", v)
df[is.na(df[,v]) & df[,effort.var] > 0, v] <- 0
}