熊猫数据帧列以另一列为条件

时间:2017-08-14 18:11:26

标签: pandas dataframe series pandas-groupby

import pandas as pd
import urllib.request
import numpy as np
url="https://www.misoenergy.org/Library/Repository/Market%20Reports/20170811_da_bc.xls"

cnstxls = urllib.request.urlopen(url)
xl = pd.ExcelFile(cnstxls)
df = xl.parse("Sheet1",skiprows=3)
constr = df.iloc[:,1:7]
constr['Class'] = np.where(constr['Hour of Occurrence'] == (1,2,3,4,5,6), 'Offpeak', 'Onpeak')
sumsp=constr.groupby('Constraint_ID','Class',axis=0)['Shadow Price'].sum().sort_values(ascending=True)`

1)新的列类给出了错误 - TypeError: invalid type comparison。如何根据多个小时设置此新列?当我只放一小时(1或2或3 ......)

时,这种方法有效

2)TypeError: groupby() got multiple values for argument 'axis'。 我想GROUPBY使用两列。它适用于一列。

1 个答案:

答案 0 :(得分:0)

试试吧:

constr['Class'] = np.where(constr['Hour of Occurrence'].isin([1,2,3,4,5,6]),'Offpeak','Onpeak')

sumsp = constr.groupby(['Constraint_ID','Class'],axis=0)['Shadow Price'].sum().sort_values(ascending=True)

print(sumsp)

输出:

Constraint_ID  Class  
281292         Onpeak    -780.05
1049           Onpeak    -364.68
4636           Onpeak    -276.62
201082         Onpeak    -245.44
1607           Onpeak    -237.36
98333          Onpeak    -112.05
107318         Onpeak     -96.10
270366         Onpeak     -80.71
267644         Onpeak     -73.25
285770         Onpeak     -59.53
1049           Offpeak    -46.52
281292         Offpeak    -33.80
270888         Onpeak     -19.68
289484         Offpeak    -10.41
               Onpeak      -4.52
1607           Offpeak     -2.60
9712           Onpeak       0.84
268470         Onpeak       1.14
248010         Onpeak       1.48
287090         Onpeak       1.63
               Offpeak     11.78
188144         Offpeak     26.32
4862           Onpeak      28.03
285770         Offpeak     50.21
Name: Shadow Price, dtype: float64

编辑unstack以转移类:

sumsp.unstack('Class')

输出:

Class          Offpeak  Onpeak
Constraint_ID                 
1049            -46.52 -364.68
1607             -2.60 -237.36
4636               NaN -276.62
4862               NaN   28.03
9712               NaN    0.84
98333              NaN -112.05
107318             NaN  -96.10
188144           26.32     NaN
201082             NaN -245.44
248010             NaN    1.48
267644             NaN  -73.25
268470             NaN    1.14
270366             NaN  -80.71
270888             NaN  -19.68
281292          -33.80 -780.05
285770           50.21  -59.53
287090           11.78    1.63
289484          -10.41   -4.52