编码器输入

时间:2019-12-06 18:35:04

标签: python encoding

我是Python的新手,正在使用Kaggle Learn。在一个过程中,他们谈论编码器。对于一种类型的编码器,它们没有在所述编码器的声明内指定要编码的列。例如,

import category_encoders as ce
cat_features = ['category', 'currency', 'country'] # these are the columns we want to encode
count_enc = ce.CountEncoder() # declaration of Encoder
count_encoded = count_enc.fit_transform(ks[cat_features]) #ks is the dataframe
data = baseline_data.join(count_encoded.add_suffix("_count")) # joins on encoded df to baseline_data
                                                              # with column names + '_count'

然后在另一个练习中,他们执行以下操作:

count_enc = CountEncoder(cols=cat_features) # Now they define the columns
count_enc.fit(train[cat_features]) # Learns what to be encoded
train_encoded = train.join(count_enc.transform(train[cat_features]).add_suffix('_count')) # applies encode
valid_encoded = valid.join(count_enc.transform(valid[cat_features]).add_suffix('_count'))

下面我的最初想法并没有在()内声明任何内容,只是一次fit_transform训练,然后transform之后有效,但被标记为不正确。

count_enc = ce.CountEncoder()
train_encoded = train.join(count_enc.fit_transform(train[cat_features]).add_suffix('_count'))
valid_encoded = valid.join(count_enc.transform(valid[cat_features]).add_suffix('_count'))

我的问题是,为什么我们需要明确声明要编码的列。为什么在这种情况下我错了?

0 个答案:

没有答案