对于大型数据集的循环次优

时间:2017-10-18 16:50:08

标签: python dataframe

所以我有一个DataFrame,其中有几千行包含人工外汇交易数据。前十行看起来像这样:

enter image description here

我想迭代这个集合,并且对于每一行,计算CommonCurrency,在这种情况下将是USD。因此,对于每一行,我会查看CurrencyPairDeskRateOrderQty列并计算CommonCurrency

for i in range(len(order_data)):
    if (order_data['CurrencyPair'][i] == 'GBP/USD'):
        order_data['CommonCurrency'][i] = order_data['DeskRate'][i] * 
        order_data['OrderQty'][i] 
    elif (order_data['CurrencyPair'][i] == 'AUD/USD'):
        order_data['CommonCurrency'][i] = order_data['DeskRate'][i] * 
        order_data['OrderQty'][i]
    elif (order_data['CurrencyPair'][i] == 'EUR/USD'):
        order_data['CommonCurrency'][i] = order_data['DeskRate'][i] * 
        order_data['OrderQty'][i]
    elif (order_data['CurrencyPair'][i] == 'USD/CHF'):
        order_data['CommonCurrency'][i] = order_data['DeskRate'][i] / 
        order_data['OrderQty'][i]
    elif (order_data['CurrencyPair'][i] == 'EUR/GBP'):
        order_data['CommonCurrency'][i] = #different calculation

这似乎不是正确的做法,特别是如果有大量不同的货币对。我遇到的另一个问题是当我到达EUR/GBP时,因为现在我必须同时从DeskRateGBP/USD获取EUR/USD,我无法看到我该怎么做这个方法。

任何提示?

1 个答案:

答案 0 :(得分:2)

大熊猫的一个有趣特征是indexing的概念。有更多pythonic方法可以做到这一点,但使用loc,您可以使用系列(列)为数据框的一部分赋值:

order_data.loc[order_data['CurrencyPair'].isin(('GBP/USD', 'AUD/USD', 'EUR/USD')), 'CurrencyPair'] = order_data['DeskRate'] * order_data['OrderQty']
order_data.loc[order_data['CurrencyPair'] == 'USD/CHF', 'CurrencyPair'] = order_data['DeskRate'] / order_data['OrderQty']
order_data.loc[order_data['CurrencyPair'] == 'EUR/GBP', 'CurrencyPair'] = some_func(order_data['DeskRate'], order_data['OrderQty'])

因此避免任何for循环