在不合并的情况下对多个熊猫DF的子集执行计算

时间:2019-04-15 11:19:38

标签: python pandas dataframe mapping pandas-groupby

我正在尝试一次在多个数据帧上执行计算,并将结果输出到一个新的数据帧。最慢的部分是合并操作,我正在使用它的结果为输出数据帧计算一些内容。我想知道使用像地图这样的东西,它在一个列上分组并在另一列上执行计算。我看过groupby,但似乎无法在不进行遍历的情况下将数据从后续组中取出。

我不知道这种方法的速度。我认为,如果我可以在“ weather_weights_df”中对站点名称进行分组,并与当前天气对每个组进行某种合并,并且同时对每个“资产”进行合并,那将是非常快的!而目前我正在使用缓慢的迭代方法。

任何帮助将不胜感激!这是我的数据:

station_weights_df

形状6148、3 全站仪28

station sitename    weightStation2Site
EGJJ    Forss   0.001371715
EGJJ    Forss   0.001371795
EGJJ    Findhorn Ecovillage 0.001398113
EGJJ    Burradale   0.001412482
EGJJ    Crystal Rig 0.001429332
EGJJ    Boyndie Airfield    0.001433208
EGJJ    Crystal Rig 0.001437219
EGJJ    Beinn Tharsuinn 0.001462698
EGJJ    Millennium Wind Farm    0.001706282
EGJJ    Ben Aketil  0.001907344
EGJJ    Ben Aketil  0.001910112
EGDR    Burradale   0.001930678
EGDR    Crystal Rig 0.001974192
EGDR    Crystal Rig 0.001978165
EGDR    Boyndie Airfield    0.002042599
EGDR    Forss   0.002098689
EGDR    Forss   0.002098831
EGDR    Findhorn Ecovillage 0.002124658
EGJJ    Dun Law 0.002197881
EGJJ    Black Law Wind Farm 0.002259689

todays_weather_data

DateTime    wind_speed_ms   station
2017-01-01 00:00:00+00:00   2   EGPH
2017-01-01 00:00:00+00:00   5   EGDM
2017-01-01 00:00:00+00:00   5   EGPD
2017-01-01 00:00:00+00:00   5   EGOM
2017-01-01 00:00:00+00:00   5   EGNM
2017-01-01 00:00:00+00:00   6   EGNS
2017-01-01 00:00:00+00:00   3   EGLL
2017-01-01 00:00:00+00:00   6   EGMD
2017-01-01 00:00:00+00:00   4   EGGP
2017-01-01 00:00:00+00:00   8   EGDR
2017-01-01 00:00:00+00:00   4   EGCC
2017-01-01 00:00:00+00:00   5   EGYM
2017-01-01 00:00:00+00:00   5   EGXW
2017-01-01 00:00:00+00:00   4   EGBB
2017-01-01 00:00:00+00:00   5   EGXE
2017-01-01 00:00:00+00:00   5   EGVO
2017-01-01 00:00:00+00:00   4   EGVN
2017-01-01 00:00:00+00:00   5   EGUW
2017-01-01 00:00:00+00:00   4   EGQL
2017-01-01 00:00:00+00:00   4   EGUB
2017-01-01 00:00:00+00:00   6   EGPN
2017-01-01 00:00:00+00:00   3   EGPF
2017-01-01 00:00:00+00:00   5   EGOS
2017-01-01 00:00:00+00:00   4   EGNT
2017-01-01 00:00:00+00:00   6   EGJJ
2017-01-01 00:00:00+00:00   6   EGDX
2017-01-01 00:00:00+00:00   8   EGQK
2017-01-01 00:00:00+00:00   3   EGWU

asset_capacity_clean

形状234,

sitename                capacity    2015 asset capacity fraction    2016 asset capacity fraction    2017 asset capacity fraction    2018 asset capacity fraction    2019 asset capacity fraction
Findhorn Ecovillage     0.75        5.81E-05                        5.53E-05                        4.80E-05                        4.12E-05                        3.41E-05
Delabole wind farm      9.2         0.000713178                     0.000677816                     0.000588951                     0.000505078                     0.000417859
Llandinam P&L           30.9        0.002395349                     0.002276579                     0.001978106                     0.001696404                     0.001403461
Rhyd-y-Groes            7.2         0.00055814                      0.000530465                     0.000460918                     0.000395279                     0.00032702
Blood Hill wind farm    2.25        0.000174419                     0.00016577                      0.000144037                     0.000123525                     0.000102194
Haverigg                1.1         8.53E-05                        8.10E-05                        7.04E-05                        6.04E-05                        5.00E-05
Carland Cross           20          0.001550388                     0.001473514                     0.001280328                     0.001097996                     0.000908389
Chelker Reservoir       1.2         9.30E-05                        8.84E-05                        7.68E-05                        6.59E-05                        5.45E-05
Coal Clough Wind Farm   9.6         0.000744186                     0.000707287                     0.000614557                     0.000527038                     0.000436027
Taff Ely                9           0.000697674                     0.000663081                     0.000576147                     0.000494098                     0.000408775
Cold Northcott          6.8         0.000527132                     0.000500995                     0.000435311                     0.000373319                     0.000308852

要获得一个简单的工作版本,我迭代“ asset_capacity_clean”数据集,然后将“ todays_weather_data”与“ station_weights_df”合并,并根据合并后的数据计算新列。它可以工作,但是真的太慢而无法使用。如果我能用几个星期的数据来做这件事就好了,但是我有10年的价值,所以10年中的每一天都为'todays_weather_data'设置一个不同的df!这是代码:

df = all_asset_heuristics_df.copy()
for i, asset in asset_capacity_clean.iterrows():
    temp = pd.DataFrame()
    temp = pd.merge(todays_weather_data[['wind_speed_ms', 'station']], station_weights_df[station_weights_df['sitename'] == asset['sitename']],  how='left', left_on=['station'], right_on = ['station'])
    temp['weighted wind'] = temp['wind_speed_ms'] * temp['weightStation2Site']

    year_colname = str(asset_fraction_per_year_dict[cur_date.year])      # gets the current year
    asset_heuristic = sum(temp['weighted wind']) * asset[year_colname]   # calculates heuristic based on which year it is

    if np.isfinite(asset_heuristic):
        line = {}
        line['DateTime'] = cur_date
        line['sitename'] = asset['sitename']
        line['asset_heuristic'] = asset_heuristic
        df = df.append(line, ignore_index=True)


    return df

我想要的输出是一个数据帧,其中包含在代码中计算出的“ asset_heuristic”,并在下面显示了前几个“ sitename”,例如:

    DateTime  asset_heuristic              sitename
0 2017-01-01         0.000439   Findhorn Ecovillage
1 2017-01-01         0.010087    Delabole wind farm
2 2017-01-01         0.032209         Llandinam P&L
3 2017-01-01         0.007630          Rhyd-y-Groes
4 2017-01-01         0.002362  Blood Hill wind farm

0 个答案:

没有答案