下面的Gini定义的函数给了我一些问题。我怀疑问题是我传递给它的数据的形状,但是,我无法解决它。 这是我得到的错误:
Traceback (most recent call last):
File "/Users/mas/Documents/workspace/LibertyMutual2015/Aug2_MWE.py", line 63, in <module>
mse.append(Gini(test_fold.target, pred))
File "/Users/mas/Documents/workspace/LibertyMutual2015/Aug2_MWE.py", line 18, in Gini
true_order = arr[arr[:,0].argsort()][::-1,0]
IndexError: too many indices
这是代码:
import pandas as pd
from pandas import *
from sklearn import ensemble
from sklearn.cross_validation import *
import random
def Gini(y_true, y_pred):
# check and get number of samples
assert y_true.shape == y_pred.shape
n_samples = y_true.shape[0]
# sort rows on prediction column
# (from largest to smallest)
arr = np.array([y_true, y_pred]).transpose()
true_order = arr[arr[:,0].argsort()][::-1,0]
pred_order = arr[arr[:,1].argsort()][::-1,0]
# get Lorenz curves
L_true = np.cumsum(true_order) / np.sum(true_order)
L_pred = np.cumsum(pred_order) / np.sum(pred_order)
L_ones = np.linspace(0, 1, n_samples)
# get Gini coefficients (area between curves)
G_true = np.sum(L_ones - L_true)
G_pred = np.sum(L_ones - L_pred)
# normalize to true Gini coefficient
return G_pred/G_true
features = np.random.randint(0,10,size=[100,5])
target = np.random.randint(0,2,size=100)
df = DataFrame(features)
df['target'] = target
#print df.head()
kf = KFold(df.shape[0], n_folds=10)
mse = []
fold_count = 0
for train, test in kf:
print("Processing fold %s" % fold_count)
train_fold = df.ix[train]
test_fold = df.ix[test]
features = [col for col in df.columns if col not in ['target']]
# Get training examples
train_fold_input = train_fold[features].values
train_fold_output = train_fold['target']
# Fit RandomForest
cfr = ensemble.RandomForestClassifier(n_estimators = 500, n_jobs = -1)
cfr.fit(train_fold_input, train_fold_output)
# Check MSE on test set
pred = cfr.predict(test_fold[features])
print test_fold.target
print pred
mse.append(Gini(test_fold.target, pred))
# Done with the fold
fold_count += 1
答案 0 :(得分:0)
您是否希望在该行和下一行的索引中额外添加,:
?
true_order = arr[arr[:,0].argsort() ,: ][::-1,0]
pred_order = arr[arr[:,1].argsort() ,: ][::-1,0]