我是python的初学者。我写下面的函数来分区从csv文件读取的数据。索引生成没有错误,但是当我通过此索引拆分df
时结果不正确。我的代码有什么问题?
def partition(k, number_of_fold):
names = ['Mcg', 'Gvh', 'Alm', 'Mit', 'Erl', 'Pox', 'Vac', 'Nuc', 'class']
file = 'yeast3.dat'
df = pd.read_csv(file, header=None, names=names)
print(df.ix[1:2])
print(df.ix[1:2, 3:4])
print('size: ' + str(df.size))
fold_zize = df.size / k
for i in range(k):
start_test = i * fold_zize
x_test = np.array(df.ix[start_test: (start_test + fold_zize), 0:8])
y_test = np.array(df.ix[start_test: (start_test + fold_zize), 8:9])
print("test = " + str(start_test) + " : " + str(start_test + fold_zize))
x_train = np.concatenate \
((np.array(df.ix[: start_test, 0:8]), np.array(df.ix[start_test + fold_zize:, 0:8])))
y_train = np.concatenate \
((np.array(df.ix[: start_test, 8:9]), np.array(df.ix[start_test + fold_zize:, 8:9])))
print("train1 = 0 : " + str(start_test))
print("train2 = " + str((start_test + fold_zize)) + " : " + str(df.size))
if(x_train.size + x_test.size != df.size):
print('EROOOOOOOOOOOOOOOOOOOOOOOR: ' + str(x_train.size + x_test.size) + ' ' + str(df.size))
在列表和测试数组的print语句范围内是正确的,但在if
语句中,列车和测试的总和不等于主df
大小。