Question

最近，我发现了一个非常不错的github存储库，名为SHAP，它用Python和JavaScript编写。它可以用来解释任何机器学习模型的输出。您可以观看一段不错的视频here，该视频很好地说明了该库。

在用python编写的分类问题中，我发现使用该库可能会有一些问题。

EXPLANATION：

我正好使用103个功能来对-1、0和1所描述的三个标签进行建模。我所有的功能都定义得很好，并且所有功能都有关联的标题。我的问题是我没有在每个单元中都通过103个功能。我在每个时间单位传递def get_form(self, step=None, data=None, files=None): form = super(AuthorWizard, self).get_form(step, data, files) if step is None: step = self.steps.current if step == '3': #and here change the field of the form: form.fields['refCodeAuthor'] = self.get_cleaned_data_for_step('1')['refCode']个要素，即当前时间103*15的103个要素和时间t_0的相同要素。我的火车数据集的形状为t_-1, ..., t_-14，其中第一个参数表示35087秒。

显然example是我如何使用它：

(35087, 15, 103)

我想得到的只是

如何在t_-1，...，t_-14时刻处理特征以获取每个特征的贡献？

更新

这是我尝试过的内容的更新：

因此，起初，我具有import shap # we use the first 100 training examples as our background dataset to integrate over explainer = shap.DeepExplainer(model, x_train[:100]) # explain the first 10 predictions # explaining each prediction requires 2 * background dataset size runs shap_values = explainer.shap_values(x_test[:10]) # init the JS visualization code shap.initjs() # transform the indexes to words import numpy as np words = imdb.get_word_index() num2word = {} for w in words.keys(): num2word[words[w]] = w x_test_words = np.stack([np.array(list(map(lambda x: num2word.get(x, "NONE"), x_test[i]))) for i in range(10)]) # plot the explanation of the first prediction # Note the model is "multi-output" because it is rank-2 but only has one column shap.force_plot(explainer.expected_value[0], shap_values[0][0], x_test_words[0])功能和103标签。 3是X_train.shape，而(35087, 15, 103)是X_test.shape。

(11696, 15, 103)

这里explainer = shap.DeepExplainer(Model.model, X_train[:100]) shap_values = explainer.shap_values(X_test[:10])似乎是形状为shap_values的三个阵列的列表。

(10, 15, 103)

从那里我收到错误X_test_flatten = X_test.flatten() shap.summary_plot(shap_values, X_test_flatten, features_names=FEATURES)

有人可以帮助我获得*** IndexError: index 40 is out of bounds for axis 0 with size 15的上述信息吗？

如何获得具有数据形状（35087、15、103）的每个要素的贡献？

0 个答案: