Question

如果你想运行它，这是我的代码

import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from treeinterpreter import treeinterpreter as ti
import operator


X, y = make_classification(n_samples=1000,
                           n_features=6,
                           n_informative=3,
                           n_classes=2,
                           random_state=0,
                           shuffle=False)

# Creating a dataFrame
df = pd.DataFrame({'Feature 1':X[:,0],
                                  'Feature 2':X[:,1],
                                  'Feature 3':X[:,2],
                                  'Feature 4':X[:,3],
                                  'Feature 5':X[:,4],
                                  'Feature 6':X[:,5],
                                  'Class':y})


y_train = df['Class']
X_train = df.drop('Class',axis = 1)

rf = RandomForestClassifier(n_estimators=50,
                               random_state=0)

rf.fit(X_train, y_train)


importances = rf.feature_importances_
importances = X_train.columns

instances = X_train.iloc[[60]]


prediction, biases, contributions = ti.predict(rf, instances)

我试图以两种方式对列表进行排序，首先使用itemgetter

for i in range(len(instances)):
    for c, feature in sorted(zip(contributions[i], importances), key=operator.itemgetter(1)):
        print (feature, np.round(c, 5))

其次，使用key= lambda

for i in range(len(instances)):
    for c, feature in sorted(zip(contributions[i], importances), key=lambda x: x[0].any()):
        print (feature, np.round(c, 5))

但是运行两种解决方案都产生了相同的输出

Feature 1 [ 0.16033 -0.16033]
Feature 2 [-0.02422  0.02422]
Feature 3 [-0.15412  0.15412]
Feature 4 [ 0.17162 -0.17162]
Feature 5 [ 0.02897 -0.02897]
Feature 6 [ 0.01889 -0.01889]

我想使用上面输出的第一列对列表进行排序，任何想法我做错了什么？

更新2：只是澄清问题

按功能顺序对其进行排序，而不是功能

括号内的值

更新3：如果使用第一列

进行排序，输出应如何显示

Feature 4 [ 0.17162 -0.17162]
Feature 1 [ 0.16033 -0.16033]
Feature 5 [ 0.02897 -0.02897]
Feature 6 [ 0.01889 -0.01889]
Feature 2 [-0.02422  0.02422]
Feature 3 [-0.15412  0.15412]

如果使用第二列

进行排序，输出应该如何

Feature 3 [-0.15412  0.15412]
Feature 2 [-0.02422  0.02422]
Feature 6 [ 0.01889 -0.01889]
Feature 5 [ 0.02897 -0.02897]
Feature 1 [ 0.16033 -0.16033]
Feature 4 [ 0.17162 -0.17162]

更新4在排序

中包含if条件

打印值，条件是它们大于0.01和-0.01

Answer 1

首先，您需要以可行的格式投射数据：contributions.shape为(1, 6, 2)。使用contributions[0]可以轻松地使用zip进行迭代：

zip(importances, contributions[0])

将产生name + [values]对。以下是如何迭代，通过链接索引使用lambda进行排序：

for name, values in sorted(zip(importances, contributions[0]), key=lambda pair: pair[1][0]):
    print(name, values)

Lambda将采用name + values对，values采用[1]，而第一列值采用[0]。

过滤是另一项任务。之后读取/调试代码的最简单方法是只检查for循环中的值：

for name, values in sorted(zip(importances, contributions[0]), key=lambda pair: pair[1][0]):
    if -0.01 < values[0] > 0.01: 
        print(name, values)

如果a < b > c含糊不清，您可以将其切换为not a < b < c或（在您的情况下）abs(b) > a

在for循环中对压缩列表进行排序

更新2：只是澄清问题

更新3：如果使用第一列

更新4在排序

1 个答案: