Question

I have large dataset which i am reading from text file and I want to perform an operation on it. I use

T[fields[0:-1]]=T[fields[0:-1]].astype(float)

to be sure that all the values are float. I get the Error setting an array element with a sequence on one the columns. I changed T.replace('NaN', np.nan) the NaN to nan but still the same issue. I used

dtypeCount =[T.iloc[:,i].apply(type).value_counts() for i in range(T.shape[1])]

to determine the type of the data on that column and this is the results

 Name: PD_PRESSURE, dtype: int64, <class 'NoneType'>    3676479
 <class 'float'>        192217

Due to size of the dataset I can't figure out where this coming from. Any thought on how I can solve this or how how I can find what is causing this?

Thanks in advance.

Update: Full Error message

ValueError                                Traceback (most recent call last)
<ipython-input-33-fa0a78194654> in <module>()
    162     if Aggregate_Flag==1:
    163         # This line make sure that all the data are defined as float
--> 164         T[fields[0:-1]]=T[fields[0:-1]].astype(float)
    165         # defining the function inside the loop is not the best practice. However, since the number of iterations
    166         #( number of file are small), I put it insider the loop to improve the readibility of the code.

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    116                 else:
    117                     kwargs[new_arg_name] = new_arg_value
--> 118             return func(*args, **kwargs)
    119         return wrapper
    120     return _deprecate_kwarg

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors, **kwargs)
   4002         # else, only a single dtype is given
   4003         new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors,
-> 4004                                      **kwargs)
   4005         return self._constructor(new_data).__finalize__(self)
   4006 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in astype(self, dtype, **kwargs)
   3460 
   3461     def astype(self, dtype, **kwargs):
-> 3462         return self.apply('astype', dtype=dtype, **kwargs)
   3463 
   3464     def convert(self, **kwargs):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
   3327 
   3328             kwargs['mgr'] = self
-> 3329             applied = getattr(b, f)(**kwargs)
   3330             result_blocks = _extend_blocks(applied, result_blocks)
   3331 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in astype(self, dtype, copy, errors, values, **kwargs)
    542     def astype(self, dtype, copy=False, errors='raise', values=None, **kwargs):
    543         return self._astype(dtype, copy=copy, errors=errors, values=values,
--> 544                             **kwargs)
    545 
    546     def _astype(self, dtype, copy=False, errors='raise', values=None,

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in _astype(self, dtype, copy, errors, values, klass, mgr, **kwargs)
    623 
    624                 # _astype_nansafe works fine with 1-d only
--> 625                 values = astype_nansafe(values.ravel(), dtype, copy=True)
    626                 values = values.reshape(self.shape)
    627 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy)
    701 
    702     if copy:
--> 703         return arr.astype(dtype)
    704     return arr.view(dtype)
    705 

ValueError: setting an array element with a sequence.

If I exclude column PD_PRESSURE, I don't receive any error.

I also tried T['PD_PRESSURE'].dtype(float) and I get the error but for other columns it works fine.

If I run T[fields[0:-1]]=T[fields[0:-1]] it works fine by itself, based on these I thought probably the error is coming from PD_Pressure column.

T.info() returns

<class 'pandas.core.frame.DataFrame'> 
DatetimeIndex: 3868696 entries, 2000-01-01 to 2017-04-11 
columns (total 6 columns): 
A object 
B object 
C object 
D object 
PD_PRESSURE object 
F object 
dtypes: object(6) 
memory usage: 1.0+ GB

Error setting an array element with a sequence, when using Pandas, astype (float)

0 个答案: