在R中,如何并行计算汇总函数?

时间:2017-12-25 17:21:29

标签: r parallel-processing nnet

我有一个庞大的数据集。我在nnet包中通过multinom计算了多项式回归。

output <- summary(mylogit) 

Coef<-t(as.matrix(output$coefficients))

需要10分钟。但是当我使用汇总函数来计算系数时 它需要超过1天! 这是我使用的代码:

mydata:
to  RealAge
513 59.608
513 84.18
0   85.23
119 74.764
116 65.356
0   89.03
513 92.117
69  70.243
253 88.482
88  64.23
513 64
4   84.03
65  65.246
69  81.235
513 87.663
513 81.21
17  75.235
117 49.112
69  59.019
20  90.03

我想知道是否有人知道如何通过R中的并行处理来计算这部分代码?

这是一小部分数据:

  public string AccountDetails(string Account_Number)
        {
            var accountNumber = int.Parse(Account_Number);//It could be better to use TryParse
            using (HalifaxDatabaseEntities context = new HalifaxDatabaseEntities())
            {
                var inOut = context.Current_Account_Deposit.Where(x => x.Account_Number == accountNumber).Select(w => new AccountTransaction
                {
                    Account_Number = w.Account_Number,
                    Account_Balance = (decimal?)0M,
                    Deposit = (decimal?)w.Amount,
                    Withdrawal = (decimal?)null,
                    Date = w.Date,
                     Account_Type=null,  
                    Account_Holder_Tittle = null,
                    Account_Holder_FirstName =null,
                    Account_Holder_LastName = null
                }).Union(context.Current_Account_Withdraw.Where(x => x.Account_Number == accountNumber).Select(d => new AccountTransaction
                {
                    Account_Number = d.Account_Number,
                    Account_Balance = (decimal?)0M,
                    Deposit = (decimal?)null,
                    Withdrawal = (decimal?)d.Amount,
                    Date = d.Date,
                    Account_Type = null,
                    Account_Holder_Tittle = null,
                    Account_Holder_FirstName = null,
                    Account_Holder_LastName = null
                })).OrderBy(r => r.Date)
                .Union(context.Current_Account_Details.Where(x => x.Account_Number == accountNumber).Select(e => new AccountTransaction
                {
                    Account_Number = e.Account_Number,
                    Account_Balance = (decimal?)e.Account_Balance,
                    Deposit = (decimal?)0M,
                    Withdrawal = (decimal?)0M,
                    Date = e.Account_Creation_Date,
                    Account_Type=e.Account_Type,  
                    Account_Holder_Tittle = null,
                    Account_Holder_FirstName =null,
                    Account_Holder_LastName = null

                }))
                .Union(context.Current_Account_Holder_Details.Where(x=>x.Account_Number ==accountNumber).Select(d=> new AccountTransaction
                {
                    Account_Number = d.Account_Number,
                    Account_Balance = null,
                    Deposit =null,
                    Withdrawal = null,
                    Date = null,
                    Account_Type = null,
                    Account_Holder_Tittle =d.Tittle,
                    Account_Holder_FirstName=d.Account_Holder_First_Name,
                    Account_Holder_LastName=d.Account_Holder_Last_Name


                }));
                var js = new System.Web.Script.Serialization.JavaScriptSerializer();
                return js.Serialize(inOut);
            }
        }

1 个答案:

答案 0 :(得分:0)

如果您只想要系数,请仅使用计算量较少的coef()方法。

示例:

mydata <- readr::read_table("to  RealAge
513 59.608
513 84.18
0   85.23
119 74.764
116 65.356
0   89.03
513 92.117
69  70.243
253 88.482
88  64.23
513 64
4   84.03
65  65.246
69  81.235
513 87.663
513 81.21
17  75.235
117 49.112
69  59.019
20  90.03")[rep(1:20, 3000), ]

mylogit <- nnet::multinom(to ~ RealAge, mydata)
system.time(output <- summary(mylogit))          # 6 sec
all.equal(output$coefficients, coef(mylogit))    # TRUE & super fast

如果您对summary()功能进行了分析,您会发现crossprod()功能占用了大部分时间。 因此,如果您真的想要summary()函数的输出,可以使用优化的数学库,例如Microsoft R Open提供的MKL。