所以...我正在使用Microsoft R Client进行一些模型拟合。这使我可以访问revoScaleR程序包(以前由Revolution Analytics生产)。这样就可以访问R的内置版本的一些替代模型拟合算法。我以前没有使用过它们,因此作为快速的初始测试,我使用了一个小的数据集,并使用R函数(lm和glm)和revoScaleR函数(rxGlm和rxLogit)来拟合许多(平凡的)模型,应该都给出相同的答案...但是他们没有。
### 1. lm - gives correct answer
Propensity.Model <- lm(Large_Claim_Proportion ~ 1,
data = Data.Set.NZ.Claim.Version,
weights = NZ.Claim.Count)
Propensity.Model$fitted.values[1] # gives correct answer of 0.01837672
### 2. glm (logistic) - gives correct answer
Propensity.Model <- glm(Large_Claim_Proportion ~ 1,
data = Data.Set.NZ.Claim.Version,
weights = NZ.Claim.Count,
family = binomial(link = "logit"))
Propensity.Model$fitted.values[1] # gives correct answer is 0.01837672
### 3. rxLogit - giving wrong answer
Propensity.Model <- rxLogit(Large_Claim_Proportion ~ 1,
data = Data.Set.NZ.Claim.Version,
fweights = "NZ.Claim.Count")
Predicted_Values <- rxPredict(modelObject = Propensity.Model,
data = Data.Set.NZ.Claim.Version)
Predicted_Values[1,] # gives wrong answer of 0.0165901
### 4. rxGlm (logistic) (alternative to rxLogit) - giving wrong answer
Propensity.Model <- rxGlm(Large_Claim_Proportion ~ 1,
data = Data.Set.NZ.Claim.Version,
fweights = "NZ.Claim.Count",
family = binomial(link = "logit"))
Predicted_Values <- rxPredict(modelObject = Propensity.Model,
data = Data.Set.NZ.Claim.Version)
Predicted_Values[1,] # gives wrong answer of 0.0165901
### 5. rxGlm (log-binomial) - giving correct answer (although log-binomial isn't a natural pairing)
Propensity.Model <- rxGlm(Large_Claim_Proportion ~ 1,
data = Data.Set.NZ.Claim.Version,
fweights = "NZ.Claim.Count",
family = binomial(link = "log"))
Predicted_Values <- rxPredict(modelObject = Propensity.Model,
data = Data.Set.NZ.Claim.Version)
Predicted_Values[1,] # gives correct answer of 0.01837672
Large_Claim_Proportion项始终在[0,1]范围内,因此应该适合作为逻辑回归模型拟合中的因变量。
如果阅读此书的人对这些功能的工作原理以及可能出了问题的地方有所了解,我将不胜感激。我已经很习惯在其他情况下拟合GLM,但是我对R还是比较陌生-所以我怀疑我犯了某种基本的菜鸟编码错误。
谢谢。