I have large dataset which i am reading from text file and I want to perform an operation on it. I use
T[fields[0:-1]]=T[fields[0:-1]].astype(float)
to be sure that all the values are float. I get the Error setting an array element with a sequence on one the columns. I changed T.replace('NaN', np.nan)
the NaN to nan but still the same issue. I used
dtypeCount =[T.iloc[:,i].apply(type).value_counts() for i in range(T.shape[1])]
to determine the type of the data on that column and this is the results
Name: PD_PRESSURE, dtype: int64, <class 'NoneType'> 3676479
<class 'float'> 192217
Due to size of the dataset I can't figure out where this coming from. Any thought on how I can solve this or how how I can find what is causing this?
Thanks in advance.
Update: Full Error message
ValueError Traceback (most recent call last)
<ipython-input-33-fa0a78194654> in <module>()
162 if Aggregate_Flag==1:
163 # This line make sure that all the data are defined as float
--> 164 T[fields[0:-1]]=T[fields[0:-1]].astype(float)
165 # defining the function inside the loop is not the best practice. However, since the number of iterations
166 #( number of file are small), I put it insider the loop to improve the readibility of the code.
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
116 else:
117 kwargs[new_arg_name] = new_arg_value
--> 118 return func(*args, **kwargs)
119 return wrapper
120 return _deprecate_kwarg
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors, **kwargs)
4002 # else, only a single dtype is given
4003 new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors,
-> 4004 **kwargs)
4005 return self._constructor(new_data).__finalize__(self)
4006
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in astype(self, dtype, **kwargs)
3460
3461 def astype(self, dtype, **kwargs):
-> 3462 return self.apply('astype', dtype=dtype, **kwargs)
3463
3464 def convert(self, **kwargs):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
3327
3328 kwargs['mgr'] = self
-> 3329 applied = getattr(b, f)(**kwargs)
3330 result_blocks = _extend_blocks(applied, result_blocks)
3331
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in astype(self, dtype, copy, errors, values, **kwargs)
542 def astype(self, dtype, copy=False, errors='raise', values=None, **kwargs):
543 return self._astype(dtype, copy=copy, errors=errors, values=values,
--> 544 **kwargs)
545
546 def _astype(self, dtype, copy=False, errors='raise', values=None,
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in _astype(self, dtype, copy, errors, values, klass, mgr, **kwargs)
623
624 # _astype_nansafe works fine with 1-d only
--> 625 values = astype_nansafe(values.ravel(), dtype, copy=True)
626 values = values.reshape(self.shape)
627
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy)
701
702 if copy:
--> 703 return arr.astype(dtype)
704 return arr.view(dtype)
705
ValueError: setting an array element with a sequence.
If I exclude column PD_PRESSURE
, I don't receive any error.
I also tried T['PD_PRESSURE'].dtype(float)
and I get the error but for other columns it works fine.
If I run T[fields[0:-1]]=T[fields[0:-1]]
it works fine by itself, based on these I thought probably the error is coming from PD_Pressure
column.
T.info()
returns
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3868696 entries, 2000-01-01 to 2017-04-11
columns (total 6 columns):
A object
B object
C object
D object
PD_PRESSURE object
F object
dtypes: object(6)
memory usage: 1.0+ GB