我有几个载体,例如:
s1:1;s2:1;s3:0;s4:0;s5:0;s6:0;s7:0;s8:1;s9:0;s10:0;s11:1;s12:0;s13:0;s14:0;s15:0;p:1
s1:1;s2:1;s3:0;s4:0;s5:0;s6:0;s7:0;s8:1;s9:0;s10:1;s11:0;s12:0;s13:0;s14:0;s15:0;p:0
s1:1;s2:1;s3:0;s4:0;s5:0;s6:0;s7:0;s8:1;s9:1;s10:0;s11:0;s12:0;s13:0;s14:0;s15:0;p:0
s1:1;s2:1;s3:0;s4:0;s5:0;s6:0;s7:1;s8:0;s9:0;s10:0;s11:0;s12:0;s13:0;s14:0;s15:1;p:1
根据最后一个条目(p)我想确定,哪个组件s1 ... s15对确定p的结果最重要。
我搜索一个能够“理解”几个变量之间相关性的机器学习算法。如果例如s2 = s4 = 1 = p
总是成立,那么算法应该给我一些类似的东西:s2 = 1.0, s4 = 1.0
。对于确定p的结果不太重要的变量应该具有0.0到1.0之间的值
我的代码到目前为止:
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.decomposition import PCA
iris = datasets.load_iris()
X = iris.data
y = iris.target
y = y[1:50]
X = X[1:50]
target_names = iris.target_names
pca = PCA(n_components=4)
X_r = pca.fit(X).transform(X)
plt.figure()
for c, i, target_name in zip("rgb", [0, 1, 2], target_names):
plt.scatter(X_r[y == i, 0], X_r[y == i, 1], c=c, label=target_name)
plt.legend()
plt.title('PCA of IRIS dataset')
plt.show()
# Matlab code to achieve what I want. See also:
# http://de.mathworks.com/help/stats/pca.html
# [a,b,c]= pca(X);
# c
答案 0 :(得分:3)
这是我的解决方案,基于Ryan的评论和他的代码链接(上图):
from sklearn.ensemble import ExtraTreesClassifier
# Build a forest and compute the feature importances
forest = ExtraTreesClassifier(n_estimators=250,
random_state=0)
X = np.array([
[0,0,0,0,1,1,1,0,1,0,0,0,0,0,0],
[0,0,0,0,1,1,1,1,0,0,0,0,0,0,0],
[0,0,0,1,0,0,0,0,0,0,0,0,1,1,1],
[0,0,0,1,0,0,0,0,0,0,0,1,0,1,1]])
y = np.array([1, 0, 0, 1])
forest.fit(X, y)
#array with importances of each feature
importances = forest.feature_importances_
print importances