为子结构添加新列:值的长度与索引的长度不匹配

时间:2016-08-01 09:22:07

标签: python pandas

我的数据框包含:TIMESTAMPP_ACT_KWP_SOUSCR

df2 = pd.read_csv('C:/Users/Demonstrator/Downloads/power.csv',delimiter=';')

首先,我删除了缺失的观察结果:

df_no_missing = df2.dropna()

然后,我尝试添加一个名为depassement的新列,其中包含值if(df2['P_ACT_KW'] - df2['P_SOUSCR']) < 0 else df2['P_ACT_KW']- df2['P_SOUSCR']

df_no_missing['depassement'] = np.where((df_no_missing['P_SOUSCR'] - df_no_missing['P_ACT_KW']) < 0), 0, df_no_missing['P_ACT_KW'] - df_no_missing['P_SOUSCR']

但是我收到了这个错误:

ValueError                                
Traceback (most recent call last)  in ()
----> 1 df_no_missing['depassement'] = np.where((df_no_missing['P_SOUSCR'] - df_no_missing['P_ACT_KW'])  2357             self._set_item(key, value)    2358     2359     def _setitem_slice(self, key, value):

C:\Users\Demonstrator\Anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)    2421     2422         self._ensure_valid_index(value)
-> 2423         value = self._sanitize_column(key, value)    2424         NDFrame._set_item(self, key, value)    2425 

C:\Users\Demonstrator\Anaconda3\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value)    2576     2577             # turn me into an ndarray
-> 2578             value = _sanitize_index(value, self.index, copy=False)    2579             if not isinstance(value, (np.ndarray, Index)):    2580                 if isinstance(value, list) and len(value) > 0:

C:\Users\Demonstrator\Anaconda3\lib\site-packages\pandas\core\series.py in _sanitize_index(data, index, copy)    2768     2769     if len(data) != len(index):
-> 2770         raise ValueError('Length of values does not match length of ' 'index')    2771     2772     if isinstance(data, PeriodIndex):

ValueError: Length of values does not match length of index

有任何想法请解决这个问题吗?

1 个答案:

答案 0 :(得分:0)

您可以将参数inplace=True添加到df2,以便原地删除NaN并更正括号:

import pandas as pd
import numpy as np

df2 = pd.DataFrame({'P_SOUSCR':[10,2,1,np.nan],
                    'P_ACT_KW':[4,5,6,4]})

df2.dropna(inplace=True)
print (df2)
   P_ACT_KW  P_SOUSCR
0         4      10.0
1         5       2.0
2         6       1.0

df2['depassement'] = np.where((df2['P_SOUSCR'] - df2['P_ACT_KW']) < 0,
                               0, 
                               df2['P_ACT_KW'] - df2['P_SOUSCR'])
print (df2)
   P_ACT_KW  P_SOUSCR  depassement
0         4      10.0         -6.0
1         5       2.0          0.0
2         6       1.0          0.0

另一个解决方案是添加copy

df_no_missing = df2.dropna().copy()