Make my R code run faster (nested loops)

时间:2016-02-03 03:06:13

标签: r performance runtime

Imagine you have a 150 by 5 matrix. Each element contains a random integer from 0 to 20.

Now think each column of the matrix as independent; I need to loop through all the possible combination of all 5 columns, which yields 150^5 = 75937500000 combinations.

It is critical I run every single combination exactly once. The order which I ran combinations do not matter.

I tried doing this using while loops. See code below.

To run this loop, based on my calculation it would take me 54 hours on my laptop.

Questions

  • Any way to make my code run faster on my laptop? (bootstrapping?)

  • If not, are there any web R servers I can access that would run my code at a significant faster rate?

  • If not, would it make any significant difference to run this in another/faster language? (Python)

    while(counter1 <= 150)
     {
       while(counter2 <= 150)
      {
        while(counter3 <= 150)
         {
          while(counter4 <= 150)
           {
            while(counter5 <= 150)
           {
          #Other operations that take additional time#
          result<-c(
          giant_matrix[counter1,1], 
          giant_matrix[counter2,2], 
          giant_matrix[counter3,3], 
          giant_matrix[counter4,4], 
          giant_matrix[counter5,5])
    
          counter5=counter5+1
        }
        counter5=1
        counter4=counter4+1
      }
      counter4=1
      counter3=counter3+1
    }
    counter3=1
    counter2=counter2+1
    }
    counter2=1
    counter1=counter1+1
    }
    

1 个答案:

答案 0 :(得分:0)

Here is the above solution with a 20 x 5 matrix of 100 elements. It results in a data frame of 3,200,000 x 5 size:

m <- matrix(sample(1:20, 100, replace = TRUE), nrow = 20)
df <- expand.grid(m[, 1], m[, 2], m[, 3], m[, 4], m[, 5])

Example output of the above df (head):

head(df)
  Var1 Var2 Var3 Var4 Var5
1   10   19   13    4    7
2   19   19   13    4    7
3    3   19   13    4    7
4    5   19   13    4    7
5   11   19   13    4    7
6    8   19   13    4    7

nrow(df)
[1] 3200000

dim(df)
[1] 3200000       5