Question

我有数据（我将提供数据的负责人），并想在＆＃34;准确性＆＃34;上模拟GLM。我的R工作室一直在冻结我运行最终代码GLM的代码。我不知道该怎么办，我完全被卡住了..

ContractNr Year ValidFrom  ValidThru    Exposure EarnedPremium
1    3006024 2013  1.1.2013  31.3.2013 0,246575342   53,79877695
2    3006024 2013  1.4.2013  22.4.2013 0,060273973   13,48774798
3    3012819 2013  1.1.2013 31.12.2013           1   367,0053327
4    3012819 2014  1.1.2014 31.12.2014           1   367,0053327
5    3012819 2015  1.1.2015  26.4.2015 0,317808219   116,6373112
6    3014874 2013  1.1.2013  28.2.2013 0,161643836   57,71979747
YearlyNetPremium     ClaimNr ClaimDate ClaimYear NClaims   Incurred
1      218,1839288          NA                  NA       0          0
2      223,7740007          NA                  NA       0          0
3      367,0053327 61861914012 21.8.2013      2013       1 1390,86693
4      367,0053327          NA                  NA       0          0
5      367,0053327          NA                  NA       0          0
6       357,080103          NA                  NA       0          0
    Payments Reserve County ConstrYear EngPerfKW Weight BonusMalus Age Gender
1          0       0     GM       1999        40    975          0  51 female
2          0       0     GM       1999        40    975          0  51 female
3 1390,86693       0      L       2003       132   1834         -1  58 female
4          0       0      L       2003       132   1834         -1  59 female
5          0       0      L       2003       132   1834         -1  60 female
6          0       0     PE       2004        55   1318          0  79   male
  ClaimReason    Make Telematics CarAge G_EngPerfKW G_Weight G_Age
1          NA Renault          0     16          25      500    50
2          NA Renault          0     16          25      500    50
3           1    Audi          0     12         125     1500    50
4          NA    Audi          0     12         125     1500    50
5          NA    Audi          0     12         125     1500    60
6          NA    Opel          0     11          50     1000    70

我想要做的是＆＃34; NClaims＆＃34;，这是二元制作权重，因此制作GLM。我尝试过类似于机器学习（训练/测试数据）的东西，它已经奏效了。

library(caret)
library(FSelector)
set.seed(42)
dataset<-read.csv(file.choose(),header=T,sep=";")
str(dataset)
dataset$NClaims[is.na(dataset$NClaims)]<-names(which.max(table(dataset$NClaims)))
dataset$ClaimReason<-NULL
dataset$ClaimNr<-NULL
dataset$ClaimDate<-NULL
dataset$ClaimYear<-NULL
dataset$Incurred<-NULL
dataset$Payments<-NULL
dataset$Reserve<-NULL
colSums(is.na(dataset))
dataset$ValidFrom<-NULL
dataset$ValidThru<-NULL
dataset$County<-NULL
dataset$Gender<-NULL
dataset$Make<-NULL
weights_info_gain<-information.gain(NClaims ~ ., data=dataset)
weights_info_gain
weights_gain_ratio = gain.ratio(NClaims ~ ., data=dataset)
weights_gain_ratio
most_important_attributes <- cutoff.k(weights_gain_ratio, 20)
most_important_attributes
formula_with_most_important_attributes <- as.simple.formula(most_important_attributes, "NClaims")
formula_with_most_important_attributes
fitCtrl = trainControl(method="repeatedcv", number=5, repeats=3)
modelGLM = train(formula_with_most_important_attributes, data=dataset, method="glm", trControl=fitCtrl, metric="Accuracy",na.action = na.pass)

我已经扔掉了约会我不确定GLM是否会采取（比如＆＃34; Make＆＃34;或者只是没有数字）。谢谢你的帮助!!

Answer 1

我从问题中尝试re-sampling data来创建更大的data集来回答问题。 code在sample data set上运行良好：

在commas'columns'，'YearlyNetPremium'，'Exposure'，'EarnedPremium'，'{ {1}}'等，以点号'。'
已将Incurred添加到Payments（以后将它们设置为NA's）
将blank rows转换为NULL或strings

下面的

date似乎适用于上面的示例year：

导入库

Code

通过对问题中的数据进行采样来创建示例数据

data

上面问题中的代码

library(lubridate)
library(caret)
library(FSelector)

GLM建模错误

1 个答案: