尝试在Python 3.6.3中删除数据框中的变量时出错

时间:2018-02-11 23:50:38

标签: python-3.x pandas compiler-errors

我有一个包含322,098个观测值和3,868个变量的数据库。为了测试脚本,我生成了一个包含50个观察值和3,868个变量的子样本。当我在子样本中运行脚本时,它完美地运行。但是,当我尝试使用完整数据库(322,098备注)运行时,从数据框中删除交易变量会产生错误。

以下是脚本:

## Load External DataSet

mydata = pd.read_csv ('C:\\Users\\Inspiron\\Desktop\\policies.csv', sep = ',', na_values = '.')

## Normalized Data

mydata ['normalized'] = (mydata ['trade'] - mydata ['trade'].min ())/(mydata ['trade'].max () - mydata ['trade'].min ())

## Descriptive Statistics for a Single Variable

mydata ['normalized'].describe ()

## Drop Columns

mydata = mydata.drop (['trade'], axis = 1)

以下是错误:

Traceback (most recent call last):
File "C:\Users\Inspiron\OneDrive\academic\articles\2018\non-discriminatory\script-dofile\mfn.py", line 31, in <module>
mydata = mydata.drop (['trade'], axis = 1)
File "C:\Python36\lib\site-packages\pandas\core\generic.py", line 2530, in drop
obj = obj._drop_axis(labels, axis, level=level, errors=errors)
File "C:\Python36\lib\site-packages\pandas\core\generic.py", line 2563, in _drop_axis
dropped = self.reindex(**{axis_name: new_axis})
File "C:\Python36\lib\site-packages\pandas\util\_decorators.py", line 127, in wrapper
return func(*args, **kwargs)
File "C:\Python36\lib\site-packages\pandas\core\frame.py", line 2935, in reindex
return super(DataFrame, self).reindex(**kwargs)
File "C:\Python36\lib\site-packages\pandas\core\generic.py", line 3004, in reindex
self._consolidate_inplace()
File "C:\Python36\lib\site-packages\pandas\core\generic.py", line 3677, in _consolidate_inplace
self._protect_consolidate(f)
File "C:\Python36\lib\site-packages\pandas\core\generic.py", line 3666, in _protect_consolidate
result = f()
File "C:\Python36\lib\site-packages\pandas\core\generic.py", line 3675, in f
self._data = self._data.consolidate()
File "C:\Python36\lib\site-packages\pandas\core\internals.py", line 3826, in consolidate
bm._consolidate_inplace()
File "C:\Python36\lib\site-packages\pandas\core\internals.py", line 3831, in _consolidate_inplace
self.blocks = tuple(_consolidate(self.blocks))
File "C:\Python36\lib\site-packages\pandas\core\internals.py", line 4853, in _consolidate
_can_consolidate=_can_consolidate)
File "C:\Python36\lib\site-packages\pandas\core\internals.py", line 4876, in _merge_blocks
new_values = new_values[argsort]
MemoryError

有人能帮助我吗?

2 个答案:

答案 0 :(得分:1)

我会尝试使用sklearn.preprocessing.MinMaxScaler代替:

from sklearn.preprocessing import MinMaxScaler

mms = MinMaxScaler()

# use `.to_frame()` to prevent `ValueError: Expected 2D array, got 1D array instead:`    
mydata['normalized'] = mms.fit_transform(mydata.pop('trade').to_frame())

答案 1 :(得分:1)

而不是从交易变量创建规范化变量,而不是直接规范化交易变量。

按照命令执行:

## Normalized Data

mydata ['trade'] = (mydata ['trade'] - mydata ['trade'].min ())/(mydata ['trade'].max () - mydata ['trade'].min ())