通过在Python中应用加权平均值来汇总箱中的字典列表

时间:2018-08-28 18:59:44

标签: python list dictionary weighted-average

我有一个字典列表,看起来像这样:

_input = [{'cumulated_quantity': 30, 'price': 7000, 'quantity': 30},
         {'cumulated_quantity': 80, 'price': 7002, 'quantity': 50},
         {'cumulated_quantity': 130, 'price': 7010, 'quantity': 50},
         {'cumulated_quantity': 330, 'price': 7050, 'quantity': 200},
         {'cumulated_quantity': 400, 'price': 7065, 'quantity': 70}]

我想将字典分组为数量100的容器,其中价格被计算为加权平均值。结果应如下所示:

result = [{'cumulated_quantity': 100, 'price': 7003, 'quantity': 100},
          {'cumulated_quantity': 200, 'price': 7038, 'quantity': 100},
          {'cumulated_quantity': 300, 'price': 7050, 'quantity': 100},
          {'cumulated_quantity': 400, 'price': 7060.5, 'quantity': 100}]

结果字典中的加权平均值计算如下:

7003 = (30*7000+50*7002+20*7010)/100 
7038 = (30*7010+70*7050)/100
7050 = 100*7050/100
7060.5 = (30*7050+70*7065)/100

我通过利用熊猫数据帧设法接收到结果,但是它们的性能太慢了(大约0.5秒)。在python中有快速的方法吗?

2 个答案:

答案 0 :(得分:0)

不使用熊猫,几乎可以自己完成:

result = []
cumulative_quantity = 0
bucket = {'price': 0.0, 'quantity': 0}
for dct in lst:
    dct_quantity = dct['quantity']  # enables non-destructive decrementing
    while dct_quantity > 0:
        if bucket['quantity'] == 100:
            bucket['cumulative_quantity'] = cumulative_quantity
            result.append(bucket)
            bucket = {'price': 0.0, 'quantity': 0}
        added_quantity = min([dct_quantity, 100 - bucket['quantity']])
        bucket['price'] = (bucket['price'] * bucket['quantity'] + dct['price'] * added_quantity) / (bucket['quantity'] + added_quantity)
        dct_quantity -= added_quantity
        bucket['quantity'] += added_quantity
        cumulative_quantity += added_quantity
if bucket['quantity'] != 0:
    bucket['cumulative_quantity'] = cumulative_quantity
    result.append(bucket)

给予

>>> result
[{'cumulative_quantity': 100, 'price': 7003.0, 'quantity': 100}, 
 {'cumulative_quantity': 200, 'price': 7038.0, 'quantity': 100}, 
 {'cumulative_quantity': 300, 'price': 7050.0, 'quantity': 100}, 
 {'cumulative_quantity': 400, 'price': 7060.5, 'quantity': 100}]

这可以线性完成,如O(p),其中p是零件数(相当于O(n * k),其中k是每个字典必须分割的平均片数(在您的示例中为k = 1.6)。

答案 1 :(得分:0)

BIN_SIZE = 100

cum_quantity = 0
value = 0.
bin_quantity = 0
bin_value = 0
results = []

for record in _input:
    price, quantity = record['price'], record['quantity']
    while quantity:
        prior_quantity = bin_quantity
        bin_quantity = min(BIN_SIZE, bin_quantity + quantity)
        quantity_delta = bin_quantity - prior_quantity
        bin_value += quantity_delta * price
        quantity -= quantity_delta
        if bin_quantity == BIN_SIZE:
            avg_price = bin_value / float(BIN_SIZE)
            cum_quantity += BIN_SIZE
            bin_quantity = bin_value = 0  # Reset bin values.
            results.append({'cumulated_quantity': cum_quantity,
                            'price': avg_price,
                            'quantity': BIN_SIZE})


# Add stub for anything left in remaining bin (optional).
if bin_quantity:
    results.append({'cumulated_quantity': cum_quantity + bin_quantity,
                    'price': bin_value / float(bin_quantity),
                    'quantity': bin_quantity})

>>> results
[{'cumulated_quantity': 100, 'price': 7003.0, 'quantity': 100},
 {'cumulated_quantity': 200, 'price': 7038.0, 'quantity': 100},
 {'cumulated_quantity': 300, 'price': 7050.0, 'quantity': 100},
 {'cumulated_quantity': 400, 'price': 7060.5, 'quantity': 100}]