如果你想运行它,这是我的代码
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from treeinterpreter import treeinterpreter as ti
import operator
X, y = make_classification(n_samples=1000,
n_features=6,
n_informative=3,
n_classes=2,
random_state=0,
shuffle=False)
# Creating a dataFrame
df = pd.DataFrame({'Feature 1':X[:,0],
'Feature 2':X[:,1],
'Feature 3':X[:,2],
'Feature 4':X[:,3],
'Feature 5':X[:,4],
'Feature 6':X[:,5],
'Class':y})
y_train = df['Class']
X_train = df.drop('Class',axis = 1)
rf = RandomForestClassifier(n_estimators=50,
random_state=0)
rf.fit(X_train, y_train)
importances = rf.feature_importances_
importances = X_train.columns
instances = X_train.iloc[[60]]
prediction, biases, contributions = ti.predict(rf, instances)
我试图以两种方式对列表进行排序,首先使用itemgetter
for i in range(len(instances)):
for c, feature in sorted(zip(contributions[i], importances), key=operator.itemgetter(1)):
print (feature, np.round(c, 5))
其次,使用key= lambda
for i in range(len(instances)):
for c, feature in sorted(zip(contributions[i], importances), key=lambda x: x[0].any()):
print (feature, np.round(c, 5))
但是运行两种解决方案都产生了相同的输出
Feature 1 [ 0.16033 -0.16033]
Feature 2 [-0.02422 0.02422]
Feature 3 [-0.15412 0.15412]
Feature 4 [ 0.17162 -0.17162]
Feature 5 [ 0.02897 -0.02897]
Feature 6 [ 0.01889 -0.01889]
我想使用上面输出的第一列对列表进行排序,任何想法我做错了什么?
按功能顺序对其进行排序,而不是功能
括号内的值Feature 4 [ 0.17162 -0.17162]
Feature 1 [ 0.16033 -0.16033]
Feature 5 [ 0.02897 -0.02897]
Feature 6 [ 0.01889 -0.01889]
Feature 2 [-0.02422 0.02422]
Feature 3 [-0.15412 0.15412]
如果使用第二列
进行排序,输出应该如何Feature 3 [-0.15412 0.15412]
Feature 2 [-0.02422 0.02422]
Feature 6 [ 0.01889 -0.01889]
Feature 5 [ 0.02897 -0.02897]
Feature 1 [ 0.16033 -0.16033]
Feature 4 [ 0.17162 -0.17162]
打印值,条件是它们大于0.01和-0.01
答案 0 :(得分:0)
首先,您需要以可行的格式投射数据:contributions.shape
为(1, 6, 2)
。使用contributions[0]
可以轻松地使用zip
进行迭代:
zip(importances, contributions[0])
将产生name + [values]
对。以下是如何迭代,通过链接索引使用lambda进行排序:
for name, values in sorted(zip(importances, contributions[0]), key=lambda pair: pair[1][0]):
print(name, values)
Lambda将采用name + values
对,values
采用[1]
,而第一列值采用[0]
。
过滤是另一项任务。之后读取/调试代码的最简单方法是只检查for
循环中的值:
for name, values in sorted(zip(importances, contributions[0]), key=lambda pair: pair[1][0]):
if -0.01 < values[0] > 0.01:
print(name, values)
如果a < b > c
含糊不清,您可以将其切换为not a < b < c
或(在您的情况下)abs(b) > a