我有一个庞大的数据集。我在nnet包中通过multinom计算了多项式回归。
output <- summary(mylogit)
Coef<-t(as.matrix(output$coefficients))
需要10分钟。但是当我使用汇总函数来计算系数时 它需要超过1天! 这是我使用的代码:
mydata:
to RealAge
513 59.608
513 84.18
0 85.23
119 74.764
116 65.356
0 89.03
513 92.117
69 70.243
253 88.482
88 64.23
513 64
4 84.03
65 65.246
69 81.235
513 87.663
513 81.21
17 75.235
117 49.112
69 59.019
20 90.03
我想知道是否有人知道如何通过R中的并行处理来计算这部分代码?
这是一小部分数据:
public string AccountDetails(string Account_Number)
{
var accountNumber = int.Parse(Account_Number);//It could be better to use TryParse
using (HalifaxDatabaseEntities context = new HalifaxDatabaseEntities())
{
var inOut = context.Current_Account_Deposit.Where(x => x.Account_Number == accountNumber).Select(w => new AccountTransaction
{
Account_Number = w.Account_Number,
Account_Balance = (decimal?)0M,
Deposit = (decimal?)w.Amount,
Withdrawal = (decimal?)null,
Date = w.Date,
Account_Type=null,
Account_Holder_Tittle = null,
Account_Holder_FirstName =null,
Account_Holder_LastName = null
}).Union(context.Current_Account_Withdraw.Where(x => x.Account_Number == accountNumber).Select(d => new AccountTransaction
{
Account_Number = d.Account_Number,
Account_Balance = (decimal?)0M,
Deposit = (decimal?)null,
Withdrawal = (decimal?)d.Amount,
Date = d.Date,
Account_Type = null,
Account_Holder_Tittle = null,
Account_Holder_FirstName = null,
Account_Holder_LastName = null
})).OrderBy(r => r.Date)
.Union(context.Current_Account_Details.Where(x => x.Account_Number == accountNumber).Select(e => new AccountTransaction
{
Account_Number = e.Account_Number,
Account_Balance = (decimal?)e.Account_Balance,
Deposit = (decimal?)0M,
Withdrawal = (decimal?)0M,
Date = e.Account_Creation_Date,
Account_Type=e.Account_Type,
Account_Holder_Tittle = null,
Account_Holder_FirstName =null,
Account_Holder_LastName = null
}))
.Union(context.Current_Account_Holder_Details.Where(x=>x.Account_Number ==accountNumber).Select(d=> new AccountTransaction
{
Account_Number = d.Account_Number,
Account_Balance = null,
Deposit =null,
Withdrawal = null,
Date = null,
Account_Type = null,
Account_Holder_Tittle =d.Tittle,
Account_Holder_FirstName=d.Account_Holder_First_Name,
Account_Holder_LastName=d.Account_Holder_Last_Name
}));
var js = new System.Web.Script.Serialization.JavaScriptSerializer();
return js.Serialize(inOut);
}
}
答案 0 :(得分:0)
如果您只想要系数,请仅使用计算量较少的coef()
方法。
示例:
mydata <- readr::read_table("to RealAge
513 59.608
513 84.18
0 85.23
119 74.764
116 65.356
0 89.03
513 92.117
69 70.243
253 88.482
88 64.23
513 64
4 84.03
65 65.246
69 81.235
513 87.663
513 81.21
17 75.235
117 49.112
69 59.019
20 90.03")[rep(1:20, 3000), ]
mylogit <- nnet::multinom(to ~ RealAge, mydata)
system.time(output <- summary(mylogit)) # 6 sec
all.equal(output$coefficients, coef(mylogit)) # TRUE & super fast
如果您对summary()
功能进行了分析,您会发现crossprod()
功能占用了大部分时间。
因此,如果您真的想要summary()
函数的输出,可以使用优化的数学库,例如Microsoft R Open提供的MKL。