Question

I have to perform a nonlinear multiple regression with data that looks like the following:

ID    Customer   Country   Industry      Machine-type    Service hours**
1     A          China     mass          A1              120
2     B          Europe    customized    A2              400
3     C          US        mass          A1               60
4     D          Rus       mass          A3              250
5     A          China     mass          A2              480
6     B          Europe    customized    A1              300
7     C          US        mass          A4              250
8     D          Rus       customized    A2              260
9     A          China     Customized    A2              310
10    B          Europe    mass          A1              110
11    C          US        Customized    A4               40
12    D          Rus       customized    A2              80

Dependent variable: Service hours Independent variables: Customer, Country, Industry, Machine type

I did a linear regression, but because the assumption of linearity does not hold I have to perform a nonlinear regression.

I know nonlinear regression can be done with the nls function. How do I add the categorical variables to the nonlinear regression so that I get the statistical summary in R?

Column names after adding dummies: table with dummies

ID  Customer.a  Customer.b  Customer.c  Customer.d  Country.China   Country.Europe  Country.Rus Country.US  Industry.customized industry.Customized Industry.mass   Machine type.A1 Machine type.A2 Machine type.A3 Service hours
1 1 0 0 0 1 0 0 0 0 0 1 1 0 0 120 
2 0 1 0 0 0 1 0 0 1 0 0 0 1 0 400 
3 0 0 1 0 0 0 0 1 0 0 1 0 0 1 60 
4 0 0 0 1 0 0 1 0 0 0 1 1 0 0 250 
5 1 0 0 0 1 0 0 0 1 0 0 0 0 1 480 
6 0 1 0 0 0 1 0 0 0 1 0 1 0 0 300 
7 0 0 1 0 0 0 0 1 0 0 1 0 0 1 250 
8 0 0 0 1 0 0 1 0 1 0 0 0 1 0 260 
9 1 0 0 0 1 0 0 0 0 0 1 0 1 0 210 
10 0 1 0 0 0 1 0 0 1 0 0 0 1 0 110 
11 0 0 1 0 0 0 0 1 0 0 1 0 0 1 40 
12 0 0 0 1 0 0 1 0 0 0 1 1 0 0 80

Answer 1

处理分类预测变量的方法取决于预测变量可以容纳的级别数。

对于性别等预测变量，只能采用2种形式（男性或女性），您可以简单地将它们表示为二进制（1,0）变量。

对于大于2级的预测变量，我们使用1-k的虚拟编码，其中k是特定变量所采用的级别数。有关有用的功能，请参阅dummies包。

在此之后，您可以使用公式拟合模型：

nls(Service.hours ~ predictor1 + predictor2 + predictorN, data = df)

Nonlinear regression in R with multiple categorical dependent variables

1 个答案: