如何仅编码数据帧中的分类数据

时间:2018-04-20 17:17:30

标签: pandas machine-learning data-science

enter image description here

如何仅编码数据框中的分类数据

Income  Length of Residence Median House Value  Number of Vehicles  Percentage Asian    Percentage Black    Percentage English Speaking Percentage Hispanic Percentage White    MakeDescr   SeriesDescr Msrp
1   90000   15.0    F   4   1   1   71  6   81  HYUNDAI Sonata-4 Cyl.   19395.0
2   125000  7.0 H   1   11  1   91  1   81  JEEP    Grand Cherokee-V6   29135.0
3   90000   8.0 F   1   1   1   71  6   86  JEEP    Liberty 20700.0
4   125000  8.0 F   3   1   1   86  6   86  VOLKSWAGEN  Passat-V6   28750.0
5   90000   8.0 F   1   1   1   71  6   81  JEEP    Wrangler    20210.0
6   110000  7.0 G   5   6   6   71  6   76  HYUNDAI Santa Fe-V6 25645.0
7   110000  7.0 G   3   11  6   71  6   71  HYUNDAI Sonata-4 Cyl.   15999.0
8   125000  8.0 G   1   1   11  81  6   76  HYUNDAI Santa Fe-V6 23645.0
9   125000  9.0 G   1   6   1   91  1   86  CHEVROLET TRUCK Trailblazer EXT 32040.0
10  110000  8.0 E   2   6   46  81  16  26  JEEP    Wrangler-V6 18660.0
11  125000  11.0    G   3   6   1   76  1   86  CHEVROLET TRUCK Silverado 2500 HD   31775.0
12  125000  12.0    G   2   11  6   66  1   71  CHEVROLET   Cobalt  13675.0
13  125000  13.0    G   2   1   16  95  6   71  HYUNDAI Veracruz-V6 28600.0
15  110000  11.0    F   5   6   41  61  11  41  HYUNDAI Santa Fe    22499.0
16  125000  9.0 F   2   1   6   91  1   81  HYUNDAI Santa Fe    22499.0
17  125000  8.0 G   2   11  11  66  1   66  MITSUBISHI  Endeavor-V6 32602.0
18  110000  12.0    E   1   6   46  81  16  26  HYUNDAI Accent-4 Cyl.   10899.0
19  90000   9.0 F   4   1   6   71  6   81  JEEP    Grand Cherokee-6 Cyl.   29080.0
21  125000  8.0 G   1   6   1   76  1   86  MITSUBISHI  Endeavor-V6 29302.0
22  110000  12.0    F   2   6   26  66  11  51  HYUNDAI Santa Fe    22499.0
23  90000   9.0 F   1   6   6   66  6   76  HYUNDAI Santa Fe-V6 20995.0
24  125000  9.0 H   1   6   1   91  1   81  HYUNDAI Sonata-V6   18799.0
25  90000   14.0    F   2   1   6   71  11  81  HYUNDAI Elantra-4 Cyl.  13299.0
26  125000  9.0 G   3   1   11  81  6   76  JEEP    Grand Cherokee-6 Cyl.   29080.0
27  125000  8.0 H   5   6   1   91  1   81  CHEVROLET TRUCK Trailblazer 29395.0
28  110000  12.0    E   4   6   41  61  11  36  HYUNDAI Sonata-4 Cyl.   15999.0
29  110000  10.0    E   1   6   41  61  11  36  HYUNDAI Santa Fe-V6 20995.0
30  125000  10.0    F   2   6   1   71  6   86  CHEVROLET TRUCK Tahoe   37000.0
32  90000   10.0    F   1   1   1   71  6   86  MITSUBISHI  Galant-V6   19997.0
33  125000  12.0    F   1   1   1   86  6   86  CHEVROLET TRUCK Trailblazer 28175.0
... ... ... ... ... ... ... ... ... ... ... ... ...
4451    110000  9.0 F   3   6   41  61  11  36  NISSAN  Sentra-4 Cyl.   17990.0
4452    125000  11.0    G   2   1   11  81  6   76  CHEVROLET TRUCK Tahoe   39515.0
4453    125000  8.0 H   1   6   1   91  1   81  HYUNDAI Elantra-4 Cyl.  15195.0
4454    110000  10.0    F   3   6   41  61  11  41  HYUNDAI Genesis-4 Cyl.  26750.0
4455    125000  7.0 H   4   11  1   76  1   76  HYUNDAI Sonata-4 Cyl.   19695.0
4456    125000  9.0 G   5   6   1   76  1   86  NISSAN  Altima  22500.0
4457    110000  11.0    E   1   6   46  81  16  26  GMC LIGHT DUTY  Denali  51935.0
4458    125000  6.0 H   1   11  1   76  1   76  JEEP    Liberty-V6  24865.0
4459    125000  12.0    G   3   1   16  95  6   71  HONDA   Accord-V6   26700.0
4460    125000  7.0 F   1   1   1   86  6   86  HYUNDAI Veloster-4 Cyl. 17300.0
4461    90000   10.0    F   2   6   11  66  6   71  CADILLAC    SRX-V6  42210.0
4463    110000  8.0 F   3   6   26  61  11  56  GMC LIGHT DUTY  Acadia  42390.0
4468    125000  8.0 G   1   1   1   91  1   86  HONDA   Pilot-V6    40820.0
4469    125000  10.0    H   5   11  1   91  1   81  TOYOTA  Highlander-V6   30695.0
4470    110000  12.0    F   1   6   41  61  11  41  HYUNDAI Elantra-4 Cyl.  15195.0
4473    110000  13.0    F   1   6   21  66  6   61  ACURA   TSX 32910.0
4476    125000  9.0 G   1   6   1   76  1   86  BMW X3  36750.0
4482    125000  10.0    H   1   6   1   91  1   81  SUBARU  Forester-4 Cyl. 21195.0
4486    125000  11.0    H   2   6   1   91  1   81  GMC LIGHT DUTY  Yukon XL    44315.0
4492    125000  10.0    H   2   6   1   91  1   81  BMW 5 Series    53400.0
4493    110000  12.0    G   2   6   6   71  6   76  ACURA   TL  33725.0
4494    125000  12.0    F   3   1   1   86  6   86  ACURA   TL  33725.0
4495    125000  12.0    F   3   1   1   86  6   86  ACURA   TL  33725.0
4496    125000  7.0 G   5   1   11  81  6   76  ACURA   TL  33325.0
4497    125000  9.0 G   1   6   1   76  1   86  ACURA   TL  33725.0
4498    125000  12.0    G   3   1   11  81  6   76  ACURA   TL  33725.0
4499    110000  14.0    G   8   11  6   71  6   71  ACURA   TL  33725.0
4501    125000  9.0 G   3   11  6   66  1   71  FORD    Taurus-V6   20050.0
4502    110000  2.0 G   4   11  6   71  6   71  DODGE   Stratus-4 Cyl.  15910.0
4503    125000  8.0 F   1   1   1   86  6   86  DODGE   Stratus-4 Cyl.  19145.0

1 个答案:

答案 0 :(得分:0)

# Using standard scikit-learn label encoder.
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

# Encode all string columns. Assuming all categoricals are of type str.
for c in df.select_dtypes(['object']):
    print "Encoding column " + c
    df[c] = le.fit_transform(df[c])