Question

我对R很新，我试图将数据字典定义映射到一组数据，以使文本更具可读性。

例如，根据目前在Kaggle的Ames Iowa住房数据集中的数据字典，我试图绘制房屋分区图。

> table(origData$MSZoning)

C (all)      FV      RH      RL      RM 
     10      65      16    1151     218

但是，原始数据集不包含所有这些数据点的值。

> table(housingData$MSZoning, origData$MSZoning)

                               C (all)   FV   RH   RL   RM
  Agriculture                       10    0    0    0    0
  Commercial                         0   65    0    0    0
  Floating Village Residential       0    0   16    0    0
  Industrial                         0    0    0 1151    0
  Residential High Density           0    0    0    0  218

使用我的代码进行映射后，键值对不会对齐。（例如，农业被映射到＆＃34; C＆＃34;。）我相信源数据中的空值会丢弃我的映射。

{{1}}

确保这些键和值适当对齐的更合适的方法是什么？

Answer 1

使用recode命令，我能够使这段代码正常工作。

library(car)

housingData$MSZoning <- recode(housingData$MSZoning,
  "'A'='Agriculture';
  'C (all)'='Commercial';
  'FV'='Floating Village Residential';
  'I'='Industrial';
  'RH'='Residential High Density';
  'RL'='Residential Low Density';
  'RP'='Residential Low Density Park';
  'RM'='Residential Medium Density'"
)

现在，运行表格交叉表，我看到值正确映射。

> table (housingData$MSZoning, origData$MSZoning)

                               C (all)   FV   RH   RL   RM
  Commercial                        10    0    0    0    0
  Floating Village Residential       0   65    0    0    0
  Residential High Density           0    0   16    0    0
  Residential Low Density            0    0    0 1151    0
  Residential Medium Density         0    0    0    0  218

使用缺失值映射R中的数据

1 个答案: