我有一个50万行58列的训练数据。我在训练集上做了get_dummies,它给出了输出。但是当我对测试集做同样的时候,它显示了
test_cols=list(test.select_dtypes(exclude=[np.number]).columns)
for cols in test_cols:
test=pd.get_dummies(test,columns=[cols],drop_first=True)
Traceback (most recent call last):
File "<ipython-input-42-a0040aaaad14>", line 3, in <module>
test=pd.get_dummies(test,columns=[cols],drop_first=True)
File "C:\Users\anesh\Anaconda3\envs\tensorflow\lib\site-packages\pandas\core\reshape\reshape.py", line 880, in get_dummies
drop_first=drop_first, dtype=dtype)
File "C:\Users\anesh\Anaconda3\envs\tensorflow\lib\site-packages\pandas\core\reshape\reshape.py", line 968, in _get_dummies_1d
dummy_mat = np.eye(number_of_cols, dtype=dtype).take(codes, axis=0)
File "C:\Users\anesh\Anaconda3\envs\tensorflow\lib\site-packages\numpy\lib\twodim_base.py", line 186, in eye
m = zeros((N, M), dtype=dtype, order=order)
MemoryError
我已经尝试过在堆栈溢出中给出的解决方案,但是它不起作用。任何人都可以说出最好的解决方案吗?