dask分布式内存错误

时间:2016-07-23 06:52:23

标签: python dask

在分布式作业上运行Dask时,我在调度程序上遇到以下错误:

distributed.core - ERROR -
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/distributed/core.py", line 269, in write
    frames = protocol.dumps(msg)
  File "/usr/local/lib/python3.4/dist-packages/distributed/protocol.py", line 81, in dumps
    frames = dumps_msgpack(small)
  File "/usr/local/lib/python3.4/dist-packages/distributed/protocol.py", line 153, in dumps_msgpack
    payload = msgpack.dumps(msg, use_bin_type=True)
  File "/usr/local/lib/python3.4/dist-packages/msgpack/__init__.py", line 47, in packb
    return Packer(**kwargs).pack(o)
  File "msgpack/_packer.pyx", line 231, in msgpack._packer.Packer.pack (msgpack/_packer.cpp:231)
  File "msgpack/_packer.pyx", line 239, in msgpack._packer.Packer.pack (msgpack/_packer.cpp:239)
MemoryError

调度程序或其中一个工作程序的内存是否耗尽?或两者兼而有之?

1 个答案:

答案 0 :(得分:2)

此错误的最常见原因是尝试收集过多数据,例如使用dask.dataframe在以下示例中发生:

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="p">
    <xsl:copy>
        <xsl:for-each-group select="node()" group-adjacent="string(@class)">
            <xsl:choose>
                <xsl:when test="current-grouping-key()">
                    <span class="{current-grouping-key()}"> 
                        <xsl:apply-templates select="current-group()/node()"/>
                    </span>
            </xsl:when>
            <xsl:otherwise>
                <xsl:apply-templates select="current-group()"/>
            </xsl:otherwise>  
        </xsl:choose>
    </xsl:for-each-group>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

这会将所有数据加载到整个群集的RAM中(这很好),然后尝试通过调度程序将整个结果带回本地计算机(这可能无法处理您的100的GB数据全部在一个地方。)工作者到客户端的通信通过调度程序,因此它是第一台接收所有数据的单机和第一台可能失败的机器。

如果是这种情况,那么您可能希望使用df = dd.read_csv('s3://bucket/lots-of-data-*.csv') df.compute() 方法来触发计算,但将其留在群集上。

Executor.persist

通常我们只会将df = dd.read_csv('s3://bucket/lots-of-data-*.csv') df = e.persist(df) 用于我们想要在本地会话中查看的小结果。