我正在尝试一次在多个数据帧上执行计算,并将结果输出到一个新的数据帧。最慢的部分是合并操作,我正在使用它的结果为输出数据帧计算一些内容。我想知道使用像地图这样的东西,它在一个列上分组并在另一列上执行计算。我看过groupby,但似乎无法在不进行遍历的情况下将数据从后续组中取出。
我不知道这种方法的速度。我认为,如果我可以在“ weather_weights_df”中对站点名称进行分组,并与当前天气对每个组进行某种合并,并且同时对每个“资产”进行合并,那将是非常快的!而目前我正在使用缓慢的迭代方法。
任何帮助将不胜感激!这是我的数据:
形状6148、3 全站仪28
station sitename weightStation2Site
EGJJ Forss 0.001371715
EGJJ Forss 0.001371795
EGJJ Findhorn Ecovillage 0.001398113
EGJJ Burradale 0.001412482
EGJJ Crystal Rig 0.001429332
EGJJ Boyndie Airfield 0.001433208
EGJJ Crystal Rig 0.001437219
EGJJ Beinn Tharsuinn 0.001462698
EGJJ Millennium Wind Farm 0.001706282
EGJJ Ben Aketil 0.001907344
EGJJ Ben Aketil 0.001910112
EGDR Burradale 0.001930678
EGDR Crystal Rig 0.001974192
EGDR Crystal Rig 0.001978165
EGDR Boyndie Airfield 0.002042599
EGDR Forss 0.002098689
EGDR Forss 0.002098831
EGDR Findhorn Ecovillage 0.002124658
EGJJ Dun Law 0.002197881
EGJJ Black Law Wind Farm 0.002259689
DateTime wind_speed_ms station
2017-01-01 00:00:00+00:00 2 EGPH
2017-01-01 00:00:00+00:00 5 EGDM
2017-01-01 00:00:00+00:00 5 EGPD
2017-01-01 00:00:00+00:00 5 EGOM
2017-01-01 00:00:00+00:00 5 EGNM
2017-01-01 00:00:00+00:00 6 EGNS
2017-01-01 00:00:00+00:00 3 EGLL
2017-01-01 00:00:00+00:00 6 EGMD
2017-01-01 00:00:00+00:00 4 EGGP
2017-01-01 00:00:00+00:00 8 EGDR
2017-01-01 00:00:00+00:00 4 EGCC
2017-01-01 00:00:00+00:00 5 EGYM
2017-01-01 00:00:00+00:00 5 EGXW
2017-01-01 00:00:00+00:00 4 EGBB
2017-01-01 00:00:00+00:00 5 EGXE
2017-01-01 00:00:00+00:00 5 EGVO
2017-01-01 00:00:00+00:00 4 EGVN
2017-01-01 00:00:00+00:00 5 EGUW
2017-01-01 00:00:00+00:00 4 EGQL
2017-01-01 00:00:00+00:00 4 EGUB
2017-01-01 00:00:00+00:00 6 EGPN
2017-01-01 00:00:00+00:00 3 EGPF
2017-01-01 00:00:00+00:00 5 EGOS
2017-01-01 00:00:00+00:00 4 EGNT
2017-01-01 00:00:00+00:00 6 EGJJ
2017-01-01 00:00:00+00:00 6 EGDX
2017-01-01 00:00:00+00:00 8 EGQK
2017-01-01 00:00:00+00:00 3 EGWU
形状234,
sitename capacity 2015 asset capacity fraction 2016 asset capacity fraction 2017 asset capacity fraction 2018 asset capacity fraction 2019 asset capacity fraction
Findhorn Ecovillage 0.75 5.81E-05 5.53E-05 4.80E-05 4.12E-05 3.41E-05
Delabole wind farm 9.2 0.000713178 0.000677816 0.000588951 0.000505078 0.000417859
Llandinam P&L 30.9 0.002395349 0.002276579 0.001978106 0.001696404 0.001403461
Rhyd-y-Groes 7.2 0.00055814 0.000530465 0.000460918 0.000395279 0.00032702
Blood Hill wind farm 2.25 0.000174419 0.00016577 0.000144037 0.000123525 0.000102194
Haverigg 1.1 8.53E-05 8.10E-05 7.04E-05 6.04E-05 5.00E-05
Carland Cross 20 0.001550388 0.001473514 0.001280328 0.001097996 0.000908389
Chelker Reservoir 1.2 9.30E-05 8.84E-05 7.68E-05 6.59E-05 5.45E-05
Coal Clough Wind Farm 9.6 0.000744186 0.000707287 0.000614557 0.000527038 0.000436027
Taff Ely 9 0.000697674 0.000663081 0.000576147 0.000494098 0.000408775
Cold Northcott 6.8 0.000527132 0.000500995 0.000435311 0.000373319 0.000308852
要获得一个简单的工作版本,我迭代“ asset_capacity_clean”数据集,然后将“ todays_weather_data”与“ station_weights_df”合并,并根据合并后的数据计算新列。它可以工作,但是真的太慢而无法使用。如果我能用几个星期的数据来做这件事就好了,但是我有10年的价值,所以10年中的每一天都为'todays_weather_data'设置一个不同的df!这是代码:
df = all_asset_heuristics_df.copy()
for i, asset in asset_capacity_clean.iterrows():
temp = pd.DataFrame()
temp = pd.merge(todays_weather_data[['wind_speed_ms', 'station']], station_weights_df[station_weights_df['sitename'] == asset['sitename']], how='left', left_on=['station'], right_on = ['station'])
temp['weighted wind'] = temp['wind_speed_ms'] * temp['weightStation2Site']
year_colname = str(asset_fraction_per_year_dict[cur_date.year]) # gets the current year
asset_heuristic = sum(temp['weighted wind']) * asset[year_colname] # calculates heuristic based on which year it is
if np.isfinite(asset_heuristic):
line = {}
line['DateTime'] = cur_date
line['sitename'] = asset['sitename']
line['asset_heuristic'] = asset_heuristic
df = df.append(line, ignore_index=True)
return df
我想要的输出是一个数据帧,其中包含在代码中计算出的“ asset_heuristic”,并在下面显示了前几个“ sitename”,例如:
DateTime asset_heuristic sitename
0 2017-01-01 0.000439 Findhorn Ecovillage
1 2017-01-01 0.010087 Delabole wind farm
2 2017-01-01 0.032209 Llandinam P&L
3 2017-01-01 0.007630 Rhyd-y-Groes
4 2017-01-01 0.002362 Blood Hill wind farm