如何在dask中绘制直方图?

时间:2016-06-28 21:38:38

标签: python dask

t是一个dask数组。我想绘制t的直方图。 Dask文档有方法

dask.array.histogram(a, bins=None, range=None, normed=False, weights=None, density=None)

但没有例子。我尝试使用numpy数组设置bins。没有工作。我尝试使用matplotlib.pyplot并花了超过5分钟并且没有生成任何内容(我的数据集非常大(GB大小),但这似乎很长时间了。)

2 个答案:

答案 0 :(得分:1)

Dask.array.histogram要求binsrange分别设置所需的二进制数和数据的最小/最大范围。这是一个简单的例子:

In [1]: import dask.array as da

In [2]: x = da.random.normal(10, 0.1, size=(100000,), chunks=(1000,))  # random dataset 

In [3]: h, bins = da.histogram(x, bins=100, range=[9, 11])

In [4]: bins
Out[4]: 
array([  9.  ,   9.02,   9.04,   9.06,   9.08,   9.1 ,   9.12,   9.14,
         9.16,   9.18,   9.2 ,   9.22,   9.24,   9.26,   9.28,   9.3 ,
         9.32,   9.34,   9.36,   9.38,   9.4 ,   9.42,   9.44,   9.46,
         9.48,   9.5 ,   9.52,   9.54,   9.56,   9.58,   9.6 ,   9.62,
         9.64,   9.66,   9.68,   9.7 ,   9.72,   9.74,   9.76,   9.78,
         9.8 ,   9.82,   9.84,   9.86,   9.88,   9.9 ,   9.92,   9.94,
         9.96,   9.98,  10.  ,  10.02,  10.04,  10.06,  10.08,  10.1 ,
        10.12,  10.14,  10.16,  10.18,  10.2 ,  10.22,  10.24,  10.26,
        10.28,  10.3 ,  10.32,  10.34,  10.36,  10.38,  10.4 ,  10.42,
        10.44,  10.46,  10.48,  10.5 ,  10.52,  10.54,  10.56,  10.58,
        10.6 ,  10.62,  10.64,  10.66,  10.68,  10.7 ,  10.72,  10.74,
        10.76,  10.78,  10.8 ,  10.82,  10.84,  10.86,  10.88,  10.9 ,
        10.92,  10.94,  10.96,  10.98,  11.  ])

In [5]: h.compute()
Out[5]: 
array([   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    1,    1,    4,   15,
         19,   71,  132,  231,  376,  604,  891, 1307, 1884, 2635, 3422,
       4276, 5455, 6158, 7092, 7759, 7933, 7994, 7625, 6994, 6194, 5315,
       4272, 3381, 2529, 1803, 1324,  912,  594,  331,  225,  127,   54,
         32,   12,   10,    2,    2,    1,    1,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0])

答案 1 :(得分:0)

hvplotlink)可以在Dask DataFrame上绘制图形直方图。 Here是一个例子。

以下是伪代码。 dd是Dask DataFrame,并为名称为feature_one的特征绘制了直方图

import hvplot.dask

dd.hvplot.hist(y="feature_one")

建议使用conda安装该库:

conda install -c conda-forge hvplot