如何提高性能:循环每个级别的因子(在R中)

时间:2014-10-20 05:06:19

标签: r performance loops

我发现每个级别的因子循环都很慢。

数据是某些列车的时间表:

 col1      col2       col3     col4            col5
 train    start     density    starttime     arrivaltime
[factor] [factor]  [factor]   [date&time]   [date&time]

有10米行。有大约1k列车,所以每列火车有~10k排。

我尝试了以下测试代码:

data = data[order(data$train, data$starttime), ]   # sort according to train, and then according to starttime
length1  = numeric( length(levels(data$train))  )
ii = 1
sub = data[1,]   # initialize it           
for (t in levels(data$train))
{
  sub =  subset(data, train==t)  #subset of each train
  length1[ii] = nrow(sub)
  ii = ii +1 
  print(ii)
}

它的工作速度非常慢 - 我的笔记本电脑上的每个循环都需要几秒钟。我想知道我能做些什么来提高效率。

例如,sub是一个在每个循环中都会发生变化的变量。我应该避免将这些行复制到sub吗? sub在循环时改变长度,我应该在初始化时给它更大的内存空间吗?

PS 我真正想做的是,对于每列火车,如果命运之城= =下一趟的起始城市。代码是:

data = data[order(data$train, data$starttime), ]   # sort according to train, and then according to starttime
sub = data[1,]   # initialization           
for (t in levels(data$train))
{
  sub =  subset(data, train==t)   #subset of each train

  for (i in 1:(nrow(sub)-1)   )
  {
    if ( as.character(sub$destiny[i]) != as.character(sub$start[i+1]) )
    # if the destiny != the start city of the next trip
    { do something   }
  }
}

1 个答案:

答案 0 :(得分:-1)