我正在尝试分层洗牌,我是新手
from sklearn import preprocessing
from sklearn import cross_validation
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.model_selection import StratifiedShuffleSplit
data = featureFormat(my_dataset, features_list, sort_keys = True)
labels, features = targetFeatureSplit(data)
scaler = preprocessing.MinMaxScaler()
features = scaler.fit_transform(features)
split = StratifiedShuffleSplit(n_splits=3, test_size=0.2, random_state=42)
print len(features), len(labels)
for train_index,test_index in split.split(features, labels):
print("TRAIN:", train_index, "TEST:", test_index)
features_train,features_test = features_train[train_index],features_test[test_index]
labels_train,labels_test = labels_train[train_index],labels_test[test_index]
这是我得到的错误
即使限制达到143,显示100
答案 0 :(得分:1)
您使用的是错误的变量名称 这些行:
features_train,features_test = features_train[train_index],features_test[test_index]
labels_train,labels_test = labels_train[train_index],labels_test[test_index]
应该是:
features_train,features_test = features[train_index],features[test_index]
labels_train,labels_test = labels[train_index],labels[test_index]
您基本上在变量被声明之前使用它们。您需要对原始要素和标签进行切片。