我有一个非常简单的顺序Keras模型,我想将其加载以在Spark Dataframe上进行推理。为此,我希望使用py_suppress_warnings()
。如果我训练模型,则可以从h5加载模型并使用sparkdl.KerasTransformer
并在tensorflow.keras.models.load_model
上进行推断而不会出现问题。但是,当我通过numpy.ndarray
加载它并将其应用于数据框时,我得到:
TypeError:元组索引必须是整数或切片,而不是列表
这是一个最小的示例,其中包含我要使用的两种图层类型。
sparkdl.KerasTransformer
罪魁祸首似乎在import numpy, pandas, tensorflow.keras, sparkdl
def build_model():
n0 = tensorflow.keras.layers.BatchNormalization(input_shape=(3,),name='n0')
s = tensorflow.keras.layers.Dense(1,activation='sigmoid',name='s')
m = tensorflow.keras.models.Sequential()
m.add(n0)
m.add(s)
m.build(input_shape=(3,))
return m
# get some data (yes its noise, but that's not the issue here)
X = numpy.random.randn(100,3)
y = numpy.random.choice([0,1],size=100)
# build and fit a model
model = build_model()
model.compile(optimizer='adadelta',loss='binary_crossentropy')
history = model.fit(X,y,batch_size=32,epochs=8,verbose=0)
# save the model
model.save(model_filename)
# load the model and compare predictions (no error loading or executing the model through Keras)
m1 = tensorflow.keras.models.load_model(model_filename)
pred = model.predict(X)
pred1 = m1.predict(X)
print(numpy.abs(pred-pred1).max()) # predictions between trained and loaded model agree
# convert the data to a spark DF
df = pandas.DataFrame({"features":X.tolist(),"targets":y,"scores":pred[:,0]})
sparkDF = spark.createDataFrame(df)
# load the model as a sparkdl.KerasTransformer
transformer = sparkdl.KerasTransformer(inputCol="features",outputCol="scoreUDF",modelFile=model_filename)
# apply the model to the dataframe THIS PRODUCES THE ERROR
sparkDF1 = transformer.transform(sparkDF)
...python3.6/site-packages/keras/layers/normalization.py
但是除了破解代码和重新编译外,我似乎找不到解决方法。
我正在使用以下库版本运行Python 3.6和Spark 2.4.0:
欢迎任何建议/帮助。