TypeError:“设置”对象不可下标。 3个CSV文件

时间:2019-12-01 23:46:56

标签: python pandas scikit-learn

当尝试建立我的数据集时,会收到“ TypeError:'set'对象不可下标”错误。

dataDir = '/content/drive/My Drive/Colab Notebooks/HW 3/' # Directory with input files
trainFile = 'q2train.csv' # Training examples
labelFile = 'q2label.csv' # Test label
validFile = 'q2valid.csv' # Valid Files

train = pd.read_csv(dataDir+trainFile)
valid = pd.read_csv(dataDir+validFile)
label = pd.read_csv(dataDir+labelFile)

data_sets = {
      'train',
      'label',
      'valid'}

def get_data(data_set_name, test_prop=0.2, seed=2019):
  """returns data for training, testing, and data characteristics"""
  data = data_sets[data_set_name]
  X, y = data.data, data.target
  X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                      test_size=test_prop, 
                                                      random_state=seed)
  nF = X.shape[1] # number of features
  nC = len(np.unique(y)) # number of classes
  nTrain, nTest = len(y_train), len(y_test)
  print("\nData set: %s" %data_set_name)
  print("\tNumber of features %d" %nF)
  print("\tNumber of output classes = %d" %(nC))
  print("\tNumber of training examples = %d" %(nTrain))
  print("\tNumber of testing examples = %d" %(nTest))
  return X_train, X_test, y_train, y_test, nF, nC, nTrain, nTest

for name in data_set:
  X_train, X_test, y_train, y_test, nF, nC, nTrain, nTest = get_data(name)

我们将不胜感激,在此先感谢您。

1 个答案:

答案 0 :(得分:1)

使用字典:

train = pd.read_csv(dataDir+trainFile)
valid = pd.read_csv(dataDir+validFile)
label = pd.read_csv(dataDir+labelFile)

data_sets = {
    'train': train,
    'label': label,
    'valid': valid
}

然后data_sets[data_set_name]将检索您想要的数据集。