我是Python的新手,正在使用Kaggle Learn。在一个过程中,他们谈论编码器。对于一种类型的编码器,它们没有在所述编码器的声明内指定要编码的列。例如,
import category_encoders as ce
cat_features = ['category', 'currency', 'country'] # these are the columns we want to encode
count_enc = ce.CountEncoder() # declaration of Encoder
count_encoded = count_enc.fit_transform(ks[cat_features]) #ks is the dataframe
data = baseline_data.join(count_encoded.add_suffix("_count")) # joins on encoded df to baseline_data
# with column names + '_count'
然后在另一个练习中,他们执行以下操作:
count_enc = CountEncoder(cols=cat_features) # Now they define the columns
count_enc.fit(train[cat_features]) # Learns what to be encoded
train_encoded = train.join(count_enc.transform(train[cat_features]).add_suffix('_count')) # applies encode
valid_encoded = valid.join(count_enc.transform(valid[cat_features]).add_suffix('_count'))
下面我的最初想法并没有在()
内声明任何内容,只是一次fit_transform
训练,然后transform
之后有效,但被标记为不正确。
count_enc = ce.CountEncoder()
train_encoded = train.join(count_enc.fit_transform(train[cat_features]).add_suffix('_count'))
valid_encoded = valid.join(count_enc.transform(valid[cat_features]).add_suffix('_count'))
我的问题是,为什么我们需要明确声明要编码的列。为什么在这种情况下我错了?