熊猫:将嵌入对象转换为浮点型

时间:2020-02-06 19:10:34

标签: python pandas machine-learning

我有一个movie数据表,其中有几列带有文本/类别变量的列。我使用sentenceTransformer

将这些文本转换为相应的嵌入
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('bert-base-nli-mean-tokens')
df['movieName_embed'] = df.apply(lambda row : model.encode(row['movieName']), axis = 1)
df['usertags_embed'] = df.apply(lambda row : model.encode(row['usertags']), axis = 1)

经过这种嵌入插入和其他几种编码技术之后,dataframe看起来像这样。

enter image description here

然后我创建特征的目标如下:

X = df[['movieName_embed', 'usertags_embed', 'rating']]
y = df[['genre_fe']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

movieName_embedusertags_embed列为type: list of list of numbers,不适合在xgboost中进行训练。因此,当我执行xgboost.XGBRegressor.fit(X_train,y_train)时,我将遇到错误-

ValueError: DataFrame.dtypes for data must be int, float or bool.
                Did not expect the data types in fields movieName_embed, usertags_embed

那么我该如何转换嵌入使其适合训练呢?

0 个答案:

没有答案