目前,我正在进行数据转换。数据不是很大,大约有190k行。
我写了一个这样的for循环:
for (i in 1:nrow(df2)){
#a
record.a <- df[which(df$first_lat==df2[i,"third_lat"]
& df$first_lon==df2[i,"third_lon"]
& df$sixth_lat==df2[i,"fourth_lat"]
& df$sixth_lon==df2[i,"fourth_lon"]
& df[,4]==df2[i,4]
& df[,3]==df2[i,5]),]
df2[i,18] <- ifelse(nrow(record.a) != 0,record.a$order_cnt,NA)
#b
record.b <- df[which(df$fifth_lat==df2[i,"third_lat"]
& df$fifth_lon==df2[i,"third_lon"]
& df$sixth_lat==df2[i,"second_lat"]
& df$sixth_lon==df2[i,"second_lon"]
& df[,4]==df2[i,4]
& df[,3]==df2[i,5]),]
df2[i,19] <- ifelse(nrow(record.b) != 0,record.b$order_cnt,NA)
#c
record.c <- df[which(df$fifth_lat==df2[i,"first_lat"]
& df$fifth_lon==df2[i,"first_lon"]
& df$fourth_lat==df2[i,"second_lat"]
& df$fourth_lon==df2[i,"second_lon"]
& df[,4]==df2[i,4]
& df[,3]==df2[i,5]),]
df2[i,20] <- ifelse(nrow(record.c) != 0,record.c$order_cnt,NA)
#d
record.d <- df[which(df$third_lat==df2[i,"first_lat"]
& df$third_lon==df2[i,"first_lon"]
& df$fourth_lat==df2[i,"sixth_lat"]
& df$fourth_lon==df2[i,"sixth_lon"]
& df[,4]==df2[i,4]
& df[,3]==df2[i,5]),]
df2[i,21] <- ifelse(nrow(record.d) != 0,record.d$order_cnt,NA)
#e
record.e <- df[which(df$third_lat==df2[i,"fifth_lat"]
& df$third_lon==df2[i,"fifth_lon"]
& df$second_lat==df2[i,"sixth_lat"]
& df$second_lon==df2[i,"sixth_lon"]
& df[,4]==df2[i,4]
& df[,3]==df2[i,5]),]
df2[i,22] <- ifelse(nrow(record.e) != 0,record.e$order_cnt,NA)
#f
record.f <- df[which(df$first_lat==df2[i,"fifth_lat"]
& df$first_lon==df2[i,"fifth_lon"]
& df$second_lat==df2[i,"fourth_lat"]
& df$second_lon==df2[i,"fourth_lon"]
& df[,4]==df2[i,4]
& df[,3]==df2[i,5]),]
df2[i,23] <- ifelse(nrow(record.f) != 0,record.f$order_cnt,NA)
}
所以,基本上,我需要分别用6个标准从df中填写6列df2。在for循环中,nrow(df2)约为190k。它运行速度超慢。但我使用view(df2)来检查它并运行正常。那么有什么方法可以让它更快?我将来可能会将相同的数据转换应用于更大的数据集。
DF: df
DF2: df2
数据与地图上的网格有关。 df2基本上是df的一个子集,但添加了6个额外的列。 df和df2都具有相同的lon和lat信息。
每个grid_id代表地图中的六边形区域。每个六边形通过两对lon和lat连接到其他六个六边形。我想要做的是从六个周围六边形(以df为单位)中找到特定值,以填充df2中的列(a,b,c,d,e,f)。另外,我需要另外两个条件,即hours,ten_mins_interval。 (df [,4] == df2 [i,4]&amp; df [,3] == df2 [i,5]))
所以我认为逻辑是:
答案 0 :(得分:0)
如果您从当前的df2[,1:17]
开始,可以使用merge命令添加df[,18]
:
df2 <- merge(df[,c("first_lat","first_lon","sixth_lat","sixth_lon","col4name","col5name","order_cn")],
df2,
by.x=c("first_lat","first_lon","sixth_lat","sixth_lon","col4name","col5name"),
by.y=c("third_lat","third_lon","fourth_lat","fourth_lon","col4name","col3name"),
all.y=TRUE)
您需要将col4name
替换为第四列的名称,依此类推 - 我无法从屏幕截图中看到可能是什么。可以轻松生成此命令的另外五个版本以添加其他五个列。由于操作在整个向量上运行,它可能比循环更快。由于数据未以合适的格式提供,因此未经过测试。