大熊猫任意分布的频率均值计算

时间:2018-03-06 06:45:03

标签: python python-3.x pandas mean frequency-analysis

我有一个大型数据集,其值范围为<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script> <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js"></script> <link href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet"/> <div class="container"> <div class="hi"> <a href="#" data-toggle="tooltip" data-placement="bottom" title="" data-original-title="Hi there" class="tooltips blue-tooltip">Hi there</a> </div> <div class="hello"> <a class="tooltips" data-placement="bottom" data-toggle="tooltip" data-original-title="Hello" href="#">Hello</a> </div> </div>,分辨率为def disemvowel(string): wowels = "aeiouAEIOU" wowellist = list(wowels) correctedList = list(string) outlist=[x for x in correctedList if x not in wowels] string = "".join(str(x) for x in outlist) return string print(disemvowel("Your text wowel will be removed!")) 。分布本质上是任意的,模式值为1.样本数据集可以是:

1 to 25

如何评估不同范围内的计数(0-0.5,0.5-1等),并在pandas中找出它们的频率均值,Python。

预期输出可以是

值范围(f)出现(n)f * n

o.1

1 个答案:

答案 0 :(得分:2)

您需要cut进行分箱,然后将CategoricalIndex转换为IntervalIndex mid值,将多列转换为mul,求和和最后除以标量:< / p>

df = pd.DataFrame({'col':[1,2.2,2.8,3.7,5.5,5.8,4.3,2.7,3.5,1.8,5.9]})
print (df)
    col
0   1.0
1   2.2
2   2.8
3   3.7
4   5.5
5   5.8
6   4.3
7   2.7
8   3.5
9   1.8
10  5.9
binned = pd.cut(df['col'], np.arange(1, 7), include_lowest=True)
df1 = df.groupby(binned).size().reset_index(name='val')
df1['mid'] = pd.IntervalIndex(df1['col']).mid
df1['mul'] = df1['val'].mul(df1['mid'])
print (df1)
            col  val     mid     mul
0  (0.999, 2.0]    2  1.4995   2.999
1    (2.0, 3.0]    3  2.5000   7.500
2    (3.0, 4.0]    2  3.5000   7.000
3    (4.0, 5.0]    1  4.5000   4.500
4    (5.0, 6.0]    3  5.5000  16.500

a = df1.sum()
print (a)
val    11.0000
mid    17.4995
mul    38.4990
dtype: float64

b = a['mul'] / a['val']
print (b)
3.49990909091