我有两个数据帧(X& y)从主数据帧df
切下来,如下所示:
X = df.ix[:,df.columns!='Class']
y = df.ix[:,df.columns=='Class']
from imblearn.over_sampling import SMOTE
sm = SMOTE()
X_resampled , y_resampled = sm.fit_sample(X,y.values.ravel())
最后一行返回numpy
和X_resampled
的{{1}}二维数组。
所以我想知道如何将y_resampled
和X_resampled
转换回y_resampled
。
示例数据:
dataframe
答案 0 :(得分:2)
我相信你需要numpy.hstack
:
a = np. array([[ 0. , -1.35980713, -0.07278117, 2.53634674, 1.37815522,
-0.33832077, 0.46238778, 0.23959855, 0.0986979 , 0.36378697,
0.09079417, -0.55159953, -0.61780086, -0.99138985, -0.31116935,
1.46817697, -0.47040053, 0.20797124, 0.02579058, 0.40399296,
0.2514121 , -0.01830678, 0.27783758, -0.11047391, 0.06692807,
0.12853936, -0.18911484, 0.13355838, -0.02105305, 0.24496426],
[ 0. , 1.19185711, 0.26615071, 0.16648011, 0.44815408,
0.06001765, -0.08236081, -0.07880298, 0.08510165, -0.25542513,
-0.16697441, 1.61272666, 1.06523531, 0.48909502, -0.1437723 ,
0.63555809, 0.46391704, -0.11480466, -0.18336127, -0.14578304,
-0.06908314, -0.22577525, -0.63867195, 0.10128802, -0.33984648,
0.1671704 , 0.12589453, -0.0089831 , 0.01472417, -0.34247454]])
b = np.array([0, 100])
c = pd.DataFrame(np.hstack((a,b[:, None])))
print (c)
0 1 2 3 4 5 6 7 \
0 0.0 -1.359807 -0.072781 2.536347 1.378155 -0.338321 0.462388 0.239599
1 0.0 1.191857 0.266151 0.166480 0.448154 0.060018 -0.082361 -0.078803
8 9 ... 21 22 23 24 \
0 0.098698 0.363787 ... -0.018307 0.277838 -0.110474 0.066928
1 0.085102 -0.255425 ... -0.225775 -0.638672 0.101288 -0.339846
25 26 27 28 29 30
0 0.128539 -0.189115 0.133558 -0.021053 0.244964 0.0
1 0.167170 0.125895 -0.008983 0.014724 -0.342475 100.0
[2 rows x 31 columns]