Question

我正在处理在给定时间段内发生的GPS位置＆＃34; dateperiod＆＃34;。我想在一行（dateperiod）中使用该值，查看该dateperiod的列，并提取我正在处理的任何行的值（干扰距离）。但我也在循环中执行此操作多个干扰数据帧。虚拟数据集：

示例基本数据（data_basic_DT）：

structure(list(EndId = 1:9, dateperiod = c(141101L, 141101L, 
141101L, 141101L, 141101L, 141101L, 141101L, 141101L, 141101L
)), .Names = c("EndId", "dateperiod"), row.names = c(NA, -9L), class = "data.frame")

示例干扰数据1（low_roads）：

structure(list(EndId = 1:9, dateperiod = c(141101L, 141101L, 
141101L, 141101L, 141101L, 141101L, 141101L, 141101L, 141101L
), `151101` = c(710.211, 684.471, 676.831, 762.955, 704.06, 674.685, 
682.495, 686.586, 696.348), `150501` = c(710.211, 684.471, 676.831, 
762.955, 704.06, 674.685, 682.495, 686.586, 696.348), `141101` = c(710.211, 
684.471, 676.831, 762.955, 704.06, 674.685, 682.495, 686.586, 
696.348), `140501` = c(710.211, 684.471, 676.831, 762.955, 704.06, 
674.685, 682.495, 686.586, 696.348), `131101` = c(710.211, 684.471, 
676.831, 762.955, 704.06, 674.685, 682.495, 686.586, 696.348), 
    `130501` = c(710.211, 684.471, 676.831, 762.955, 704.06, 
    674.685, 682.495, 686.586, 696.348), `121101` = c(710.211, 
    684.471, 676.831, 762.955, 704.06, 674.685, 682.495, 686.586, 
    696.348)), .Names = c("EndId", "dateperiod", "151101", "150501", 
"141101", "140501", "131101", "130501", "121101"), row.names = c(NA, 
-9L), class = "data.frame")

防干扰数据2（high_roads）：

structure(list(EndId = 1:9, dateperiod = c(141101L, 141101L, 
141101L, 141101L, 141101L, 141101L, 141101L, 141101L, 141101L
), `151101` = c(806.415, 802.56, 502.35, 1234.2, 704.06, 685.23, 
682.495, 1002.3, 696.348), `150501` = c(710.211, 684.471, 676.831, 
762.955, 704.06, 802.56, 502.35, 1234.2, 696.348), `141101` = c(710.211, 
130.25, 453.25, 762.955, 704.06, 674.685, 682.495, 686.586, 696.348
), `140501` = c(710.211, 684.471, 802.56, 502.35, 1234.2, 674.685, 
682.495, 686.586, 696.348), `131101` = c(710.211, 684.471, 676.831, 
762.955, 704.06, 674.685, 502.35, 1234.2, 704.06), `130501` = c(710.211, 
684.471, 676.831, 762.955, 704.06, 674.685, 682.495, 686.586, 
696.348), `121101` = c(502.35, 1234.2, 704.06, 762.955, 704.06, 
674.685, 682.495, 686.586, 696.348)), .Names = c("EndId", "dateperiod", 
"151101", "150501", "141101", "140501", "131101", "130501", "121101"
), row.names = c(NA, -9L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x0000000006640788>)

因此，对于每个EndId，我希望它查看dateperiod，在此示例中看到它是141101，查看列＆＃34; 141101＆＃34;，提取值，并将其放入新列。在循环中经历low_roads和high_roads。

感谢一些帮助（下面），我的工作速度比以前快得多，用这个：

disturbancelist <- list(low_roads=low_roads, high_roads=high_roads) #Lists all the disturbance dataframes
for (d in disturbancelist){ 
  ##Create a column named by the current disturbance class
     Class<-d$Class[2] ##calls the disturbance type
  ##Merge basic data and each disturbance dateframe to get the right distance values
  mergeex<-merge(data_basic_DT, d, by.x = "EndId", by.y = "EndId", all.y == FALSE)
  mergeexdf<-as.data.frame(mergeex)
  col.names<-names(mergeexdf)
  mergeexdf$distance <- mergeexdf[cbind(1:nrow(mergeexdf), fmatch(mergeexdf$dateperiod, col.names))]
  names(data_basic_DT)[names(data_basic_DT)=="distance"] <- Class ##Change name of column to current disturbance class
  print(Class)
}

现在，我想更改此代码以在data.tables中工作，以使其运行更快。它在循环之外作为data.tables工作，但不在其中。任何帮助表示赞赏！

Answer 1

如果我理解你的话，这听起来像我回答的一个问题： R data.frame get value from variable which is selected by another variable, vectorized。虽然这个问题一般适用于data.frames，但我认为它仍然是data.table的一个很好的解决方案。编辑：可能不是，基于响应，但它在data.frames上运行良好至少。

我们的想法是使用match和names属性来获取每行的列的数字索引，然后使用它来获取值。对于名为df的数据框：

，这样的事情

df$newvar <- df[cbind(1:nrow(df), match(df$dateperiod, names(df)))]

第一个数字1:nrow(df)基本上取代了for循环，第二个数字match(df$dateperiod, names(df))标识了名称与dateperiod中包含的值相匹配的列行。它有效，因为match对整个列向量df$dateperiod进行操作，并返回相同长度的列。

希望有所帮助。

使用data.table从列中提取值，该列的名称与多个data.frames循环中的值匹配

1 个答案: