下面是数据集中唯一值的列表。所以我有如下自变量:-
<br/><br/>
fueltype [‘gas’ ‘diesel’]<br/>
aspiration [‘std’ ‘turbo’]<br/>
doornumber [‘two’ ‘four’]<br/>
carbody [‘convertible’ ‘hatchback’ ‘sedan’ ‘wagon’ ‘hardtop’]<br/>
drivewheel [‘rwd’ ‘fwd’ ‘4wd’]<br/>
enginelocation [‘front’ ‘rear’]<br/>
enginetype [‘dohc’ ‘ohcv’ ‘ohc’ ‘l’ ‘rotor’ ‘ohcf’ ‘dohcv’]<br/>
cylindernumber ['four' 'six' 'five' 'three' 'twelve' 'two' 'eight']<br/>
fuelsystem ['mpfi' '2bbl' 'mfi' '1bbl' 'spfi' '4bbl' 'idi' 'spdi']<br/>
,我想建立一个使用多元线性回归预测汽车价格的模型。我需要映射如下数据吗?
df[‘fueltype’] = df[‘fueltype’].map({‘gas’: 1, ‘diesel’: 0})<br/>
df[‘aspiration’] = df[‘aspiration’].map({‘std’: 1, ‘turbo’: 0})<br/>
如果是,那么我应该如何管理特定列具有8个差异分类值的数据(例如:fuelsystem)?
还有,当几乎每个下一个自变量都具有分类数据时,如何预测多重共线性?