我正在处理在给定时间段内发生的GPS位置" dateperiod"。我想在一行(dateperiod)中使用该值,查看该dateperiod的列,并提取我正在处理的任何行的值(干扰距离)。但我也在循环中执行此操作多个干扰数据帧。虚拟数据集:
示例基本数据(data_basic_DT):
structure(list(EndId = 1:9, dateperiod = c(141101L, 141101L,
141101L, 141101L, 141101L, 141101L, 141101L, 141101L, 141101L
)), .Names = c("EndId", "dateperiod"), row.names = c(NA, -9L), class = "data.frame")
示例干扰数据1(low_roads):
structure(list(EndId = 1:9, dateperiod = c(141101L, 141101L,
141101L, 141101L, 141101L, 141101L, 141101L, 141101L, 141101L
), `151101` = c(710.211, 684.471, 676.831, 762.955, 704.06, 674.685,
682.495, 686.586, 696.348), `150501` = c(710.211, 684.471, 676.831,
762.955, 704.06, 674.685, 682.495, 686.586, 696.348), `141101` = c(710.211,
684.471, 676.831, 762.955, 704.06, 674.685, 682.495, 686.586,
696.348), `140501` = c(710.211, 684.471, 676.831, 762.955, 704.06,
674.685, 682.495, 686.586, 696.348), `131101` = c(710.211, 684.471,
676.831, 762.955, 704.06, 674.685, 682.495, 686.586, 696.348),
`130501` = c(710.211, 684.471, 676.831, 762.955, 704.06,
674.685, 682.495, 686.586, 696.348), `121101` = c(710.211,
684.471, 676.831, 762.955, 704.06, 674.685, 682.495, 686.586,
696.348)), .Names = c("EndId", "dateperiod", "151101", "150501",
"141101", "140501", "131101", "130501", "121101"), row.names = c(NA,
-9L), class = "data.frame")
防干扰数据2(high_roads):
structure(list(EndId = 1:9, dateperiod = c(141101L, 141101L,
141101L, 141101L, 141101L, 141101L, 141101L, 141101L, 141101L
), `151101` = c(806.415, 802.56, 502.35, 1234.2, 704.06, 685.23,
682.495, 1002.3, 696.348), `150501` = c(710.211, 684.471, 676.831,
762.955, 704.06, 802.56, 502.35, 1234.2, 696.348), `141101` = c(710.211,
130.25, 453.25, 762.955, 704.06, 674.685, 682.495, 686.586, 696.348
), `140501` = c(710.211, 684.471, 802.56, 502.35, 1234.2, 674.685,
682.495, 686.586, 696.348), `131101` = c(710.211, 684.471, 676.831,
762.955, 704.06, 674.685, 502.35, 1234.2, 704.06), `130501` = c(710.211,
684.471, 676.831, 762.955, 704.06, 674.685, 682.495, 686.586,
696.348), `121101` = c(502.35, 1234.2, 704.06, 762.955, 704.06,
674.685, 682.495, 686.586, 696.348)), .Names = c("EndId", "dateperiod",
"151101", "150501", "141101", "140501", "131101", "130501", "121101"
), row.names = c(NA, -9L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x0000000006640788>)
因此,对于每个EndId,我希望它查看dateperiod,在此示例中看到它是141101,查看列&#34; 141101&#34;,提取值,并将其放入新列。在循环中经历low_roads和high_roads。
感谢一些帮助(下面),我的工作速度比以前快得多,用这个:
disturbancelist <- list(low_roads=low_roads, high_roads=high_roads) #Lists all the disturbance dataframes
for (d in disturbancelist){
##Create a column named by the current disturbance class
Class<-d$Class[2] ##calls the disturbance type
##Merge basic data and each disturbance dateframe to get the right distance values
mergeex<-merge(data_basic_DT, d, by.x = "EndId", by.y = "EndId", all.y == FALSE)
mergeexdf<-as.data.frame(mergeex)
col.names<-names(mergeexdf)
mergeexdf$distance <- mergeexdf[cbind(1:nrow(mergeexdf), fmatch(mergeexdf$dateperiod, col.names))]
names(data_basic_DT)[names(data_basic_DT)=="distance"] <- Class ##Change name of column to current disturbance class
print(Class)
}
现在,我想更改此代码以在data.tables中工作,以使其运行更快。它在循环之外作为data.tables工作,但不在其中。任何帮助表示赞赏!
答案 0 :(得分:0)
如果我理解你的话,这听起来像我回答的一个问题: R data.frame get value from variable which is selected by another variable, vectorized。虽然这个问题一般适用于data.frames,但我认为它仍然是data.table的一个很好的解决方案。编辑:可能不是,基于响应,但它在data.frames上运行良好至少。
我们的想法是使用match
和names
属性来获取每行的列的数字索引,然后使用它来获取值。对于名为df
的数据框:
df$newvar <- df[cbind(1:nrow(df), match(df$dateperiod, names(df)))]
第一个数字1:nrow(df)
基本上取代了for
循环,第二个数字match(df$dateperiod, names(df))
标识了名称与dateperiod
中包含的值相匹配的列行。它有效,因为match
对整个列向量df$dateperiod
进行操作,并返回相同长度的列。
希望有所帮助。