我正在尝试使用keras制作电影推荐模型:
import pandas as pd
from sklearn.model_selection import train_test_split
import keras
from keras.layers import Input, Embedding, Dot, Flatten
rating = pd.read_csv("./ratings.csv",usecols=[0,1,2])
users = len(rating.userId.unique())
movies = len(rating.movieId.unique())
embed_size = 3
train, test = train_test_split(rating, test_size=0.2)
movie_input = Input(shape=[1], name="movie_in")
movie_embed = Embedding(movies, embed_size, name="movie_embed")(movie_input)
movie_vector = Flatten(name="flatten_movies")(movie_embed)
user_input = Input(shape=[1], name="user_in")
user_embed = Embedding(users, embed_size, name="user_embed")(user_input)
user_vector = Flatten(name="flatten_users")(user_embed)
prod = Dot(axes=-1, name="dot-product")([movie_vector, user_vector])
model = keras.Model(inputs=[user_input, movie_input], outputs=prod)
model.compile(optimizer='adam', loss='mse')
model.fit(x=[train.userId, train.movieId], y=train.rating,epochs=10,
verbose=0)
当我尝试训练模型时,出现以下错误:
tensorflow.python.framework.errors_impl.InvalidArgumentError:
indices[15,0]= 7438 is not in [0, 5000)
[[{{node movie_embed/embedding_lookup}} = GatherV2[Taxis=DT_INT32,
Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:@training/Adam/Assign_2"],
_device="/job:localhost/replica:0/task:0/device:CPU:0"]
(movie_embed/embeddings/read, movie_embed/Cast,
training/Adam/gradients/movie_embed/embedding_lookup_grad/concat/axis)]]
但是大多数在线教程都使用相同的代码,因此它们可以正常工作。
答案 0 :(得分:0)
您的movie_embed
嵌入层(基本上是一个查询表)有5000行,因此它期望输入的范围是0到5000之间的整数。您给它7438作为输入,这会导致错误。 rating.movieId
中可能有5000个唯一值,但显然也有间隔[0, 5000)
之外的值。您需要将train.userId
整数映射到此间隔才能起作用。