使用python将具有条件的新列添加到数据框

时间:2016-08-23 17:41:17

标签: python pandas dataframe

我创建了一个数据帧df_energy:

df_energy=pd.read_csv('C:/Users/Demonstrator/Downloads/power.csv', delimiter=';', parse_dates=[0], infer_datetime_format = True)

具有这种结构:

 df_energy.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 43229 entries, 0 to 43228
Data columns (total 6 columns):
TIMESTAMP        43229 non-null datetime64[ns]
P_ACT_KW         40376 non-null float64
PERIODE_TARIF    43209 non-null object
P_SOUSCR         37501 non-null float64
SITE             43229 non-null object
TARIF            43229 non-null object
dtypes: datetime64[ns](1), float64(2), object(3)
memory usage: 2.0+ MB


TIMESTAMP P_ACT_KW PERIODE_TARIF P_SOUSCR SITE TARIF 
2015-07-31 23:00:00 12.0 HC NaN ST GEREON TURPE_HTA5 
2015-07-31 23:10:00 466.0 HC 425.0 ST GEREON TURPE_HTA5 
2015-07-31 23:20:00 18.0 HC 425.0 ST GEREON TURPE_HTA5 
2015-07-31 23:30:00 17.0 HC 425.0 ST GEREON TURPE_HTA5

当我开始学习python时,我想知道我可以添加三个新列:High_energy,Medium_energy和low_energy。

如果P_ACT_KW> 1,则High_energy包含P_ACT_KW值。在400,如果P_ACT_KW在200和400之间,则Medium_energy包含P_ACT_KW值,如果P_ACT_KW <1,则Low_energy包含P_ACT_KW值。 200。 例如:

TIMESTAMP P_ACT_KW PERIODE_TARIF P_SOUSCR SITE TARIF High_energy Medium_energy Low_energy
2015-07-31 23:00:00 12.0 HC NaN ST GEREON TURPE_HTA5 0 0 12
2015-07-31 23:10:00 466.0 HC 425.0 ST GEREON TURPE_HTA5 466 0 0
2015-07-31 23:20:00 18.0 HC 425.0 ST GEREON TURPE_HTA5 0 0 18
2015-07-31 23:30:00 17.0 HC 425.0 ST GEREON TURPE_HTA5 0 0 17

谢谢

亲切的问候

1 个答案:

答案 0 :(得分:3)

您可以使用np.where中的numpy作为:
样本df:

Out[71]: 
             TIMESTAMP  P_ACT_KW PERIODE_TARIF  P_SOUSCR       SITE  \
0  2015-07-31 23:00:00      12.0            HC       NaN  ST GEREON   
1  2015-07-31 23:10:00     466.0            HC     425.0  ST GEREON   
2  2015-07-31 23:20:00      18.0            HC     425.0  ST GEREON   
3  2015-07-31 23:30:00      17.0            HC     425.0  ST GEREON   

        TARIF  
0  TURPE_HTA5  
1  TURPE_HTA5  
2  TURPE_HTA5  
3  TURPE_HTA5

df['high_energy']=np.where(df['P_ACT_KW']>400,df['P_ACT_KW'],0)

df['medium_energy']=np.where((df['P_ACT_KW']>200)&(df['P_ACT_KW']<400),df['P_ACT_KW'],0)

df['low_energy']=np.where(df['P_ACT_KW']<200,df['P_ACT_KW'],0)

Out[72]: 
             TIMESTAMP  P_ACT_KW PERIODE_TARIF  P_SOUSCR       SITE  \
0  2015-07-31 23:00:00      12.0            HC       NaN  ST GEREON   
1  2015-07-31 23:10:00     466.0            HC     425.0  ST GEREON   
2  2015-07-31 23:20:00      18.0            HC     425.0  ST GEREON   
3  2015-07-31 23:30:00      17.0            HC     425.0  ST GEREON   

        TARIF  high_energy  medium_energy  low_energy  
0  TURPE_HTA5          0.0            0.0        12.0  
1  TURPE_HTA5        466.0            0.0         0.0  
2  TURPE_HTA5          0.0            0.0        18.0  
3  TURPE_HTA5          0.0            0.0        17.0