从价值字典和出现次数中找出中位数?

时间:2018-03-26 14:37:23

标签: python python-3.x

我有一个看起来像这样的字典(虽然更大):

{100: 8,
 110: 2,
 1000: 4
 2200: 3,
 4000: 1
 11000: 1,
}

每对由我的数据集中的值:出现次数组成。我需要计算数据集的中位数。任何提示/想法如何做?

我正在使用Python 3.6

编辑:

我不想创建一个列表(因为我的数据集的大小)。实际上,列表的大小是使用字典的原因。所以,我正在寻找另一种方式。

3 个答案:

答案 0 :(得分:1)

我相信这种解决方案同样有效,至少对于正数而言。我结合您的答案测试了一些数据集,它们的工作方式与我所知相似。

(sorted_dict是按数字键对其字典排序的字典)

    length = 0
    for value in sorted_dict.values():
        length += value
    half = length / 2
    sum_var = 0
    #finds the index of the middle of the dataset
    for val in sorted_dict.values():
        if half-sum_var > 0:
            sum_var += val
        else:
            break
    index = (list(sorted_dict.values()).index(val))
    #returns the median based off some characteristics of the dataset
    if sum(list(sorted_dict.values())[index:]) != sum(list(sorted_dict.values())[:index]):
        if sum(list(sorted_dict.values())[index:]) > sum(list(sorted_dict.values())[:index]):
            median = list(sorted_dict.keys())[index]
        else:
            median = list(sorted_dict.keys())[index-1]
    else:
        median = (list(sorted_dict.keys())[index-1] + list(sorted_dict.keys())[index]) / 2
    return(median)

答案 1 :(得分:0)

当你的dict被订购时,这将适用于python 3.6+。

from math import floor, ceil

def find_weighted_median(d):
    median_location = sum(d.values()) / 2
    lower_location = floor(median_location)
    upper_location = ceil(median_location)
    lower = None
    upper = None
    running_total = 0
    for val, count in d.items():
        if not lower and running_total <= lower_location <= running_total + count:
            lower = val
        if running_total <= upper_location <= running_total + count:
            upper = val
        if lower and upper:
            return (lower + upper) / 2
        running_total += count

答案 2 :(得分:0)

所以,没有找到令人满意的答案,这就是我提出的:

from collections import OrderedDict
import statistics

d = {
 100: 8,
 110: 2,
 1000: 4,
 2200: 3,
 4000: 1,
 11000: 1,
}

    # Sort the dictionary
values_sorted = OrderedDict(sorted(d.items(), key=lambda t: t[0]))
index = sum(values_sorted.values())/2

# Decide whether the number of records is an even or odd number
if (index).is_integer():
    even = True
else: 
    even = False

x = True

# Compute median
for value, occurences in values_sorted.items():
    index -= occurences
    if index < 0 and x is True:
        median_manual = value
        break
    elif index == 0 and even is True:
        median_manual = value/2
        x = False
    elif index < 0 and x is False:

        median_manual += value/2
        break

# Create a list of all records and compute median using statistics package
values_list = list()
for val, count in d.items():
    for count in range(count):
        values_list.append(val)

median_computed = statistics.median(values_list)

# Test the two results are equal
if median_manual != median_computed:
    raise RuntimeError

我用不同的数据集对其进行了测试,并将结果与​​statistics.median()计算的中位数进行了比较,结果相同。