Nonlinear regression in R with multiple categorical dependent variables

时间:2018-02-03 10:07:49

标签: r categorical-data non-linear-regression

I have to perform a nonlinear multiple regression with data that looks like the following:

ID    Customer   Country   Industry      Machine-type    Service hours**
1     A          China     mass          A1              120
2     B          Europe    customized    A2              400
3     C          US        mass          A1               60
4     D          Rus       mass          A3              250
5     A          China     mass          A2              480
6     B          Europe    customized    A1              300
7     C          US        mass          A4              250
8     D          Rus       customized    A2              260
9     A          China     Customized    A2              310
10    B          Europe    mass          A1              110
11    C          US        Customized    A4               40
12    D          Rus       customized    A2              80

Dependent variable: Service hours Independent variables: Customer, Country, Industry, Machine type

I did a linear regression, but because the assumption of linearity does not hold I have to perform a nonlinear regression.

I know nonlinear regression can be done with the nls function. How do I add the categorical variables to the nonlinear regression so that I get the statistical summary in R?

Column names after adding dummies: table with dummies

ID  Customer.a  Customer.b  Customer.c  Customer.d  Country.China   Country.Europe  Country.Rus Country.US  Industry.customized industry.Customized Industry.mass   Machine type.A1 Machine type.A2 Machine type.A3 Service hours
1 1 0 0 0 1 0 0 0 0 0 1 1 0 0 120 
2 0 1 0 0 0 1 0 0 1 0 0 0 1 0 400 
3 0 0 1 0 0 0 0 1 0 0 1 0 0 1 60 
4 0 0 0 1 0 0 1 0 0 0 1 1 0 0 250 
5 1 0 0 0 1 0 0 0 1 0 0 0 0 1 480 
6 0 1 0 0 0 1 0 0 0 1 0 1 0 0 300 
7 0 0 1 0 0 0 0 1 0 0 1 0 0 1 250 
8 0 0 0 1 0 0 1 0 1 0 0 0 1 0 260 
9 1 0 0 0 1 0 0 0 0 0 1 0 1 0 210 
10 0 1 0 0 0 1 0 0 1 0 0 0 1 0 110 
11 0 0 1 0 0 0 0 1 0 0 1 0 0 1 40 
12 0 0 0 1 0 0 1 0 0 0 1 1 0 0 80

1 个答案:

答案 0 :(得分:0)

处理分类预测变量的方法取决于预测变量可以容纳的级别数。

对于性别等预测变量,只能采用2种形式(男性或女性),您可以简单地将它们表示为二进制(1,0)变量。

对于大于2级的预测变量,我们使用1-k的虚拟编码,其中k是特定变量所采用的级别数。有关有用的功能,请参阅dummies包。

在此之后,您可以使用公式拟合模型:

nls(Service.hours ~ predictor1 + predictor2 + predictorN, data = df)