Question

我正在尝试分层洗牌，我是新手

from sklearn import preprocessing
from sklearn import cross_validation
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.model_selection import StratifiedShuffleSplit

data = featureFormat(my_dataset, features_list, sort_keys = True)
labels, features = targetFeatureSplit(data)
scaler = preprocessing.MinMaxScaler()
features = scaler.fit_transform(features)


split = StratifiedShuffleSplit(n_splits=3, test_size=0.2, random_state=42)
print len(features), len(labels)
for train_index,test_index in split.split(features, labels):
    print("TRAIN:", train_index, "TEST:", test_index)
    features_train,features_test = features_train[train_index],features_test[test_index] 
    labels_train,labels_test = labels_train[train_index],labels_test[test_index]

这是我得到的错误

即使限制达到143，显示100

Answer 1

您使用的是错误的变量名称这些行：

features_train,features_test = features_train[train_index],features_test[test_index] 
labels_train,labels_test = labels_train[train_index],labels_test[test_index]

应该是：

features_train,features_test = features[train_index],features[test_index] 
labels_train,labels_test = labels[train_index],labels[test_index]

您基本上在变量被声明之前使用它们。您需要对原始要素和标签进行切片。

我正在做分层分裂并且越界出错，我不知道为什么

1 个答案: