import pandas as pd
import urllib.request
import numpy as np
url="https://www.misoenergy.org/Library/Repository/Market%20Reports/20170811_da_bc.xls"
cnstxls = urllib.request.urlopen(url)
xl = pd.ExcelFile(cnstxls)
df = xl.parse("Sheet1",skiprows=3)
constr = df.iloc[:,1:7]
constr['Class'] = np.where(constr['Hour of Occurrence'] == (1,2,3,4,5,6), 'Offpeak', 'Onpeak')
sumsp=constr.groupby('Constraint_ID','Class',axis=0)['Shadow Price'].sum().sort_values(ascending=True)`
1)新的列类给出了错误 - TypeError: invalid type comparison
。如何根据多个小时设置此新列?当我只放一小时(1或2或3 ......)
2)TypeError: groupby() got multiple values for argument 'axis'
。
我想GROUPBY
使用两列。它适用于一列。
答案 0 :(得分:0)
试试吧:
constr['Class'] = np.where(constr['Hour of Occurrence'].isin([1,2,3,4,5,6]),'Offpeak','Onpeak')
sumsp = constr.groupby(['Constraint_ID','Class'],axis=0)['Shadow Price'].sum().sort_values(ascending=True)
print(sumsp)
输出:
Constraint_ID Class
281292 Onpeak -780.05
1049 Onpeak -364.68
4636 Onpeak -276.62
201082 Onpeak -245.44
1607 Onpeak -237.36
98333 Onpeak -112.05
107318 Onpeak -96.10
270366 Onpeak -80.71
267644 Onpeak -73.25
285770 Onpeak -59.53
1049 Offpeak -46.52
281292 Offpeak -33.80
270888 Onpeak -19.68
289484 Offpeak -10.41
Onpeak -4.52
1607 Offpeak -2.60
9712 Onpeak 0.84
268470 Onpeak 1.14
248010 Onpeak 1.48
287090 Onpeak 1.63
Offpeak 11.78
188144 Offpeak 26.32
4862 Onpeak 28.03
285770 Offpeak 50.21
Name: Shadow Price, dtype: float64
unstack
以转移类:sumsp.unstack('Class')
输出:
Class Offpeak Onpeak
Constraint_ID
1049 -46.52 -364.68
1607 -2.60 -237.36
4636 NaN -276.62
4862 NaN 28.03
9712 NaN 0.84
98333 NaN -112.05
107318 NaN -96.10
188144 26.32 NaN
201082 NaN -245.44
248010 NaN 1.48
267644 NaN -73.25
268470 NaN 1.14
270366 NaN -80.71
270888 NaN -19.68
281292 -33.80 -780.05
285770 50.21 -59.53
287090 11.78 1.63
289484 -10.41 -4.52