我的数据帧data1
的结构(超过150万行)是这样的:
data1 <- data.frame(NEW_UPC=c(11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005992,11820005992,11820005992,11820005992,11820005992,11820005992,11820005992,11820005992,11820005992,11820005993,11820005993,11820005993,11820005993,11820005993,11820005993,11820005993,11820005993,11820005993,11820005994,11820005994,11820005994,11820005994,11820005994,11820005994,11820005995,11820005995,11820005995,11820005995,11820005995,11820005995,11820005995,11820005995,11820005995),
IRI_KEY=c(1073521,1073521,1073521,1073525,1073525,1073525,1078106,1078106,1078106,1078107,1078107,1078107,1073521,1073521,1073521,1073525,1073525,1073525,1078106,1078106,1078106,1073521,1073521,1073521,1073525,1073525,1073525,1078106,1078106,1078106,1073521,1073521,1073525,1073525,1078106,1078106,1073521,1073521,1073521,1073525,1073525,1073525,1078106,1078106,1078106),
WEEK = c(1229,1230,1232,1218,1224,1229,1282,1285,1287,1229,1230,1232,1229,1230,1232,1218,1224,1229,1282,1285,1287,1229,1230,1232,1217,1221,1227,1270,1272,1273,1273,1274,1270,1272,1217,1221,1229,1230,1232,1218,1224,1229,1282,1285,1287),
END=c(1232,1232,1232,1229,1229,1229,1287,1287,1287,1232,1232,1232,1232,1232,1232,1229,1229,1229,1287,1287,1287,1232,1232,1232,1227,1227,1227,1273,1273,1273,1274,1274,1272,1272,1221,1221,1232,1232,1232,1229,1229,1229,1287,1287,1287))
我需要使用列Exit.time
和WEEK
中的值以及截止值为1287的列END
插入。Exit.time
应该具有基于0或1的值按照以下逻辑:
如果WEEK
= 1287,则Exit.time
= 0。
如果Week
不等于1287,但是WEEK
= END
,则Exit.time
= 1,否则Exit.time
= 0。
为此,我尝试了以下for循环,它完成了上述虚拟数据集中所需的操作。
i=0
for(i in 1:length(data2$NEW_UPC)){
if (data2$WEEK[i]==1287) {
data2$Exit.time[i] <- 0
} else if(data2$WEEK[i]==data2$END[i]) {
data2$Exit.time[i] <- 1
} else {
data2$Exit.time[i] <- 0
}
}
问题是,当我在实际数据集中使用上述循环时,即使一个小时后也没有得到输出。我认为给定数据集的大小,循环效率不高。有其他方法可以做我想要的吗?我更喜欢保持data1
中的行顺序,因为稍后需要进行一些合并操作。
答案 0 :(得分:4)
由于当Exit.time
时需要(WEEK == END) & WEEK != 1287
为1,否则为0,因此可以对as.numeric
的结果使用(WEEK == END) & WEEK != 1287
,将TRUE
更改为1
和FALSE
至0
。
data1$Exit.time <- with(data1, as.numeric(WEEK != 1287 & WEEK == END))
答案 1 :(得分:3)
有多种编码方法,主要是语义上的不同,它们基本上是在做同一件事
基本R:
data1$Exit.time <- (data1$WEEK != 1287 & data1$WEEK == data1$END)*1
这涉及大量键入data1
,因此有一个快捷方式:
data1 <- within(data1, {
Exit.time <- (WEEK != 1287 & WEEK == END)*1
})
Tidyverse:
Tidyverse是一套非常适合处理数据的软件包。我们正在使用软件包dplyr
,它是tidyverse
的一部分,因此您可以加载整个文件,也可以只加载dplyr
:
library(tidyverse)
data1 <- data1 %>%
mutate(
Exit.time = (WEEK != 1287 & WEEK == END)*1
)
(我通过乘以1来从TRUE / FALSE转换为0/1,输入的次数更少)
答案 2 :(得分:0)
使用data.table
:
setDT(data1)[, Exit.time := ifelse(WEEK == 1287, 0, ifelse(WEEK != 1287 & WEEK == END, 1, 0))]