我正在尝试运行代码:
perm = PermutationImportance(clf).fit(X_test, y_test)
eli5.show_weights(perm)
了解模型中哪些功能最重要,但是输出是
<IPython.core.display.HTML object>
任何解决方案或解决此问题的方法?
谢谢您的建议!
答案 0 :(得分:2)
(此处为 Spyder维护程序),很抱歉,目前(2019年2月)没有可用的解决方法或解决方案来在控制台中显示Web内容。
注意:我们正在考虑如何实现这一目标,但很可能要到2020年才能实现。
答案 1 :(得分:1)
感谢想法J Hudok。以下是我的工作示例
from sklearn.datasets import load_iris
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
import eli5
from eli5.sklearn import PermutationImportance
from sklearn.model_selection import train_test_split
import webbrowser
# Load iris data & convert to dataframe
iris_data = load_iris()
data = pd.DataFrame({
'sepal length': iris_data.data[:,0],
'sepal width': iris_data.data[:,1],
'petal length': iris_data.data[:,2],
'petal width': iris_data.data[:,3],
'species': iris_data.target
})
X = data[['sepal length', 'sepal width', 'petal length', 'petal width']]
y = data['species']
# Split train & test dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
# Initialize classifier
clf = RandomForestClassifier(n_estimators=56, max_depth=8, random_state=1, verbose=1)
clf.fit(X_train, y_train)
# Compute permutation feature importance
perm_importance = PermutationImportance(clf, random_state=0).fit(X_test, y_test)
# Store feature weights in an object
html_obj = eli5.show_weights(perm_importance, feature_names = X_test.columns.tolist())
# Write html object to a file (adjust file path; Windows path is used here)
with open('C:\\Tmp\\Desktop\iris-importance.htm','wb') as f:
f.write(html_obj.data.encode("UTF-8"))
# Open the stored HTML file on the default browser
url = r'C:\\Tmp\\Desktop\iris-importance.htm'
webbrowser.open(url, new=2)
答案 2 :(得分:0)
一个缺点就是只显示HTML:
with open('C:\Temp\disppage.htm','wb') as f: # Use some reasonable temp name
f.write(htmlobj.html.encode("UTF-8"))
# open an HTML file on my own (Windows) computer
url = r'C:\Temp\disppage.htm'
webbrowser.open(url,new=2)
答案 3 :(得分:0)
我找到了 Spyder 的解决方案:
clf.fit(X_train, y_train)
onehot_columns = list(clf.named_steps['preprocessor'].named_transformers_['cat'].named_steps['onehot'].get_feature_names(input_features=categorical_features))
numeric_features_list = list(numeric_features)
numeric_features_list.extend(onehot_columns)
numeric_features_list = np.array(numeric_features_list)
selected_features_bool =list(clf.named_steps['feature_selection'].get_support(indices=False))
numeric_features_list = list(numeric_features_list[selected_features_bool])
eli5.format_as_dataframe(eli5.explain_weights(clf.named_steps['classification'], top=50, feature_names=numeric_features_list))
结果它给了我一个数据帧格式的输出:
0 region_BAKI 0.064145
1 call_out_offnet_dist_w1 0.025365
2 trf_Bolge 0.022637
3 call_in_offnet_dist_w1 0.018974
4 device_os_name_Proprietary 0.018608
...