我有一个包含322,098个观测值和3,868个变量的数据库。为了测试脚本,我生成了一个包含50个观察值和3,868个变量的子样本。当我在子样本中运行脚本时,它完美地运行。但是,当我尝试使用完整数据库(322,098备注)运行时,从数据框中删除交易变量会产生错误。
以下是脚本:
## Load External DataSet
mydata = pd.read_csv ('C:\\Users\\Inspiron\\Desktop\\policies.csv', sep = ',', na_values = '.')
## Normalized Data
mydata ['normalized'] = (mydata ['trade'] - mydata ['trade'].min ())/(mydata ['trade'].max () - mydata ['trade'].min ())
## Descriptive Statistics for a Single Variable
mydata ['normalized'].describe ()
## Drop Columns
mydata = mydata.drop (['trade'], axis = 1)
以下是错误:
Traceback (most recent call last):
File "C:\Users\Inspiron\OneDrive\academic\articles\2018\non-discriminatory\script-dofile\mfn.py", line 31, in <module>
mydata = mydata.drop (['trade'], axis = 1)
File "C:\Python36\lib\site-packages\pandas\core\generic.py", line 2530, in drop
obj = obj._drop_axis(labels, axis, level=level, errors=errors)
File "C:\Python36\lib\site-packages\pandas\core\generic.py", line 2563, in _drop_axis
dropped = self.reindex(**{axis_name: new_axis})
File "C:\Python36\lib\site-packages\pandas\util\_decorators.py", line 127, in wrapper
return func(*args, **kwargs)
File "C:\Python36\lib\site-packages\pandas\core\frame.py", line 2935, in reindex
return super(DataFrame, self).reindex(**kwargs)
File "C:\Python36\lib\site-packages\pandas\core\generic.py", line 3004, in reindex
self._consolidate_inplace()
File "C:\Python36\lib\site-packages\pandas\core\generic.py", line 3677, in _consolidate_inplace
self._protect_consolidate(f)
File "C:\Python36\lib\site-packages\pandas\core\generic.py", line 3666, in _protect_consolidate
result = f()
File "C:\Python36\lib\site-packages\pandas\core\generic.py", line 3675, in f
self._data = self._data.consolidate()
File "C:\Python36\lib\site-packages\pandas\core\internals.py", line 3826, in consolidate
bm._consolidate_inplace()
File "C:\Python36\lib\site-packages\pandas\core\internals.py", line 3831, in _consolidate_inplace
self.blocks = tuple(_consolidate(self.blocks))
File "C:\Python36\lib\site-packages\pandas\core\internals.py", line 4853, in _consolidate
_can_consolidate=_can_consolidate)
File "C:\Python36\lib\site-packages\pandas\core\internals.py", line 4876, in _merge_blocks
new_values = new_values[argsort]
MemoryError
有人能帮助我吗?
答案 0 :(得分:1)
我会尝试使用sklearn.preprocessing.MinMaxScaler
代替:
from sklearn.preprocessing import MinMaxScaler
mms = MinMaxScaler()
# use `.to_frame()` to prevent `ValueError: Expected 2D array, got 1D array instead:`
mydata['normalized'] = mms.fit_transform(mydata.pop('trade').to_frame())
答案 1 :(得分:1)
而不是从交易变量创建规范化变量,而不是直接规范化交易变量。
按照命令执行:
## Normalized Data
mydata ['trade'] = (mydata ['trade'] - mydata ['trade'].min ())/(mydata ['trade'].max () - mydata ['trade'].min ())