我有一个看起来像这样的字典(虽然更大):
{100: 8,
110: 2,
1000: 4
2200: 3,
4000: 1
11000: 1,
}
每对由我的数据集中的值:出现次数组成。我需要计算数据集的中位数。任何提示/想法如何做?
我正在使用Python 3.6
编辑:
我不想创建一个列表(因为我的数据集的大小)。实际上,列表的大小是使用字典的原因。所以,我正在寻找另一种方式。
答案 0 :(得分:1)
我相信这种解决方案同样有效,至少对于正数而言。我结合您的答案测试了一些数据集,它们的工作方式与我所知相似。
(sorted_dict是按数字键对其字典排序的字典)
length = 0
for value in sorted_dict.values():
length += value
half = length / 2
sum_var = 0
#finds the index of the middle of the dataset
for val in sorted_dict.values():
if half-sum_var > 0:
sum_var += val
else:
break
index = (list(sorted_dict.values()).index(val))
#returns the median based off some characteristics of the dataset
if sum(list(sorted_dict.values())[index:]) != sum(list(sorted_dict.values())[:index]):
if sum(list(sorted_dict.values())[index:]) > sum(list(sorted_dict.values())[:index]):
median = list(sorted_dict.keys())[index]
else:
median = list(sorted_dict.keys())[index-1]
else:
median = (list(sorted_dict.keys())[index-1] + list(sorted_dict.keys())[index]) / 2
return(median)
答案 1 :(得分:0)
当你的dict被订购时,这将适用于python 3.6+。
from math import floor, ceil
def find_weighted_median(d):
median_location = sum(d.values()) / 2
lower_location = floor(median_location)
upper_location = ceil(median_location)
lower = None
upper = None
running_total = 0
for val, count in d.items():
if not lower and running_total <= lower_location <= running_total + count:
lower = val
if running_total <= upper_location <= running_total + count:
upper = val
if lower and upper:
return (lower + upper) / 2
running_total += count
答案 2 :(得分:0)
所以,没有找到令人满意的答案,这就是我提出的:
from collections import OrderedDict
import statistics
d = {
100: 8,
110: 2,
1000: 4,
2200: 3,
4000: 1,
11000: 1,
}
# Sort the dictionary
values_sorted = OrderedDict(sorted(d.items(), key=lambda t: t[0]))
index = sum(values_sorted.values())/2
# Decide whether the number of records is an even or odd number
if (index).is_integer():
even = True
else:
even = False
x = True
# Compute median
for value, occurences in values_sorted.items():
index -= occurences
if index < 0 and x is True:
median_manual = value
break
elif index == 0 and even is True:
median_manual = value/2
x = False
elif index < 0 and x is False:
median_manual += value/2
break
# Create a list of all records and compute median using statistics package
values_list = list()
for val, count in d.items():
for count in range(count):
values_list.append(val)
median_computed = statistics.median(values_list)
# Test the two results are equal
if median_manual != median_computed:
raise RuntimeError
我用不同的数据集对其进行了测试,并将结果与statistics.median()计算的中位数进行了比较,结果相同。