我的y_train由多种成分组成。每种成分都包含用逗号分隔的不同成分。这基本上是一个多类分类问题。我的y_train看起来像这样
Prawn::Document.generate("#{Rails.root}/public/uploads/orders/order-#{order.order_number}.pdf")
我尝试使用 sklearn标签编码器对分类变量进行编码。
df['ingredients_str'].head()
0 romaine lettuce,black olives,grape tomatoes
1 plain flour,ground pepper,salt,tomatoes
2 eggs,pepper,salt,mayonaise,cooking oil
3 water,vegetable oil,wheat,salt
4 black pepper,shallots,cornflour,cayenne
Name: ingredients_str, dtype: object
如何在labelencoder中转换该列?
答案 0 :(得分:0)
IIUC。为此,您可以展开标签列,处理每个列,然后将输出数据帧重新压缩为一列。
labels = pd.DataFrame(df.str.split(',').values.tolist()).fillna('')
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit([item for sublist in labels.values for item in sublist])
labels = pd.DataFrame(np.transpose([le.transform(labels[col]) for col in labels.columns]))
labels.apply(lambda x: [list(x)])
0 [[12, 11, 6, 17, 2]]
1 [[1, 8, 10, 16, 14]]
2 [[7, 13, 13, 18, 5]]
3 [[0, 15, 9, 13, 3]]
4 [[0, 0, 4, 0, 0]]
dtype: object