我是Python的新手,如果太简单了,请提前道歉。找不到任何内容,this question没有帮助。
我的代码是
# Split data
y = starbucks_smote.iloc[:, -1]
X = starbucks_smote.drop('label', axis = 1)
# Count labels by type
counter = Counter(y)
print(counter)
Counter({0: 9634, 1: 2895})
# Transform the dataset
oversample = SMOTE()
X, y = oversample.fit_resample(X, y)
# Print the oversampled dataset
counter = Counter(y)
print(counter)
Counter({0: 9634, 1: 9634})
如何保存过采样的数据集以备将来使用?
我尝试过
data_res = np.concatenate((X, y), axis = 1)
data_res.to_csv('sample_smote.csv')
遇到错误
ValueError: all the input arrays must have same number of dimensions,
but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)
感谢任何提示!
答案 0 :(得分:2)
您可以创建数据框:
data_res = pd.DataFrame(X)
data_res['y'] = y
,然后将data_res
保存到CSV。
基于级联numpy.arrays
的解决方案也是可行的,但是需要np.vstack
才能使尺寸兼容:
data_res = np.concatenate((X, np.vstack(y)), axis = 1)
data_res = pd.DataFrame(data_res)