我的数据框包含:TIMESTAMP
,P_ACT_KW
和P_SOUSCR
。
df2 = pd.read_csv('C:/Users/Demonstrator/Downloads/power.csv',delimiter=';')
首先,我删除了缺失的观察结果:
df_no_missing = df2.dropna()
然后,我尝试添加一个名为depassement
的新列,其中包含值if(df2['P_ACT_KW'] - df2['P_SOUSCR']) < 0 else df2['P_ACT_KW']- df2['P_SOUSCR']
。
df_no_missing['depassement'] = np.where((df_no_missing['P_SOUSCR'] - df_no_missing['P_ACT_KW']) < 0), 0, df_no_missing['P_ACT_KW'] - df_no_missing['P_SOUSCR']
但是我收到了这个错误:
ValueError Traceback (most recent call last) in () ----> 1 df_no_missing['depassement'] = np.where((df_no_missing['P_SOUSCR'] - df_no_missing['P_ACT_KW']) 2357 self._set_item(key, value) 2358 2359 def _setitem_slice(self, key, value): C:\Users\Demonstrator\Anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value) 2421 2422 self._ensure_valid_index(value) -> 2423 value = self._sanitize_column(key, value) 2424 NDFrame._set_item(self, key, value) 2425 C:\Users\Demonstrator\Anaconda3\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value) 2576 2577 # turn me into an ndarray -> 2578 value = _sanitize_index(value, self.index, copy=False) 2579 if not isinstance(value, (np.ndarray, Index)): 2580 if isinstance(value, list) and len(value) > 0: C:\Users\Demonstrator\Anaconda3\lib\site-packages\pandas\core\series.py in _sanitize_index(data, index, copy) 2768 2769 if len(data) != len(index): -> 2770 raise ValueError('Length of values does not match length of ' 'index') 2771 2772 if isinstance(data, PeriodIndex): ValueError: Length of values does not match length of index
有任何想法请解决这个问题吗?
答案 0 :(得分:0)
您可以将参数inplace=True
添加到df2
,以便原地删除NaN
并更正括号:
import pandas as pd
import numpy as np
df2 = pd.DataFrame({'P_SOUSCR':[10,2,1,np.nan],
'P_ACT_KW':[4,5,6,4]})
df2.dropna(inplace=True)
print (df2)
P_ACT_KW P_SOUSCR
0 4 10.0
1 5 2.0
2 6 1.0
df2['depassement'] = np.where((df2['P_SOUSCR'] - df2['P_ACT_KW']) < 0,
0,
df2['P_ACT_KW'] - df2['P_SOUSCR'])
print (df2)
P_ACT_KW P_SOUSCR depassement
0 4 10.0 -6.0
1 5 2.0 0.0
2 6 1.0 0.0
另一个解决方案是添加copy
:
df_no_missing = df2.dropna().copy()