Question

我正在尝试将一些R代码移植到Java / Scala，并且需要等效的glm函数。是否有任何Java / Scala库来解决具有准泊松误差和对数链接函数的广义线性模型？

到目前为止，我发现：

Suanshu，但无法弄清楚如何获取hat matrix。
This question，但我找不到所引用的glmulti包，只能找到同名的R包。

我没有建立自己的解算器的知识。

更新：我应该要求免费/开源。对于一个MAC地址，Suanshu似乎要花费1500美元。

Answer 1

是否有任何Java / Scala库来解决广义线性模型   准泊松误差和对数链接函数？

到目前为止，我发现：


Suanshu，但无法弄清楚如何获得帽子矩阵。

看起来您可以在Suanshu中按照此示例获取投影矩阵或帽子值。

Examples/src/com/numericalmethod/suanshu/examples/LinearRegression.java

以下是示例代码的概述，但听起来您可能想构建一个更具体的GLMProblem而不是在此处创建的基类LMProblem：

LMProblem problem = new LMProblem( 
    new DenseVector(new double[]{2.32, 0.452, 4.53, 12.34, 32.2}), 
    new DenseMatrix(new double[][]{ 
        {1.52, 2.23, 4.31}, 
        {3.22, 6.34, 3.46}, 
        {4.32, 12.2, 23.1}, 
        {10.1034, 43.2, 22.3}, 
        {12.1, 2.12, 3.27} 
    }), 
    true); 

OLSRegression regression = new OLSRegression(problem);
OLSResiduals residuals = regression.residuals();

ImmutableVector hatValues = residuals.leverage(); // gets the leverage (R hatvalues)
ImmutableMatrix hHat = residuals.hHat(); // gets the projection matrix, H-hat

希望有所帮助。

其他可能性

另见Logistic Regression in Java。

Answer 2

最后我使用了Rserve，rjson和json4s的组合（在Scala中工作）。

我更喜欢使用rjson和json4s将数据传入和传出Rserve。

将数据推送到R，运行命令，获得结果：（未经测试）

    val rInput = 
        ("MyTable" -> 
            ("ColA" -> 1 to 10) ~
            ("ColB" -> 11 to 20)
        ) ~
        ("config" -> 5)
    rConnection.assign("jsonIn", compact(render(json)))
    rConnection.parseAndEval("""
        parsedJSON = fromJSON(jsonIn)
        myData = as.data.frame(parsedJSON$MyTable)
        config = parsedJSON$config
        results = various.r.commands
        toJSON(list("res1" = results[[1]],"res2" = results[[1]]))
    """)
    val rOut = JsonMethods.parse(rExpression.asString())

获取Rserve连接，必要时启动过程：

def ensureRunning(initialAttempts: Int = 1, postStartAttempts: Int = 10, daemonizeThreads: Boolean = true){
    getConnection(initialAttempts)
        .recoverWith{case _ => 
            startRserve(daemonizeThreads)
            getConnection(postStartAttempts)
        }
        .map(_.close())
        .recover{case _ => throw new Exception("Failed to find or start Rserve")}
        .get
}

@tailrec
def getConnection(countdown: Int): Try[RConnection] = {
    if(countdown == 0) Failure(new ConnectException("Could not connect to Rserve"))
    else try{
        val c = new RConnection()
        log.debug("Rserve connection confirmed")
        Success(c)
    } catch {
        case _: Exception => 
            val newCountdown = countdown - 1
            log.debug("Searching for Rserve {} tries left",newCountdown)
            Thread.sleep(100)
            getConnection(newCountdown)
    }
}

private def startRserve(daemonizeThreads: Boolean){
    implicit def toLines(in: InputStream) = Source.fromInputStream(in).getLines

    log.info("Starting new Rserve process (daemon = {})", daemonizeThreads)
    val io = 
        new ProcessIO(
                in => in.close,
                out => {
                    out.foreach(log.info)
                    out.close   
                },
                err => {
                    err.foreach(log.error)
                    err.close   
                },
                daemonizeThreads
        )
    Process("R CMD Rserve --no-save --slave").run(io)
}

Answer 3

尝试支持各种分发系列的https://github.com/chen0040/java-glm。

Java广义线性模型库

3 个答案: