Dask agg函数pickle错误

时间:2017-11-10 09:29:35

标签: python pickle dask

我有以下dask数据帧

0
nlen_pm2r     : [0.03521797800203785, 0.0357760570004757, 0.035887997000827454]
nlen_miraj    : [0.0467547009975533, 0.04689033899921924, 0.04821185600303579]

1
nlen_pm2r     : [0.03615388999969582, 0.03690062500027125, 0.037922888001048705]
nlen_miraj    : [0.04778529000031995, 0.04816070699962438, 0.05409854399840697]

37
nlen_pm2r     : [0.04198674500003108, 0.04201827299766592, 0.04204300800120109]
nlen_miraj    : [0.06709170100293704, 0.067645346000063, 0.07428676299969084]

563
nlen_pm2r     : [0.04548632699879818, 0.045831783001631266, 0.04651351099892054]
nlen_miraj    : [0.0761348430023645, 0.07634904800215736, 0.08261940699958359]

4285
nlen_pm2r     : [0.04845972700059065, 0.048681981999834534, 0.04915663500287337]
nlen_miraj    : [0.08794620700064115, 0.09279651500037289, 0.0934514330001548]

12900
nlen_pm2r     : [0.04871700200237683, 0.04922142199939117, 0.050075729999662144]
nlen_miraj    : [0.09708551099902252, 0.09920390000115731, 0.12253420100023504]

375462
nlen_pm2r     : [0.05645462199754547, 0.056796938999468694, 0.05760099699909915]
nlen_miraj    : [0.1092712979989301, 0.10961470600159373, 0.11748507000083919]

12398765434324
nlen_pm2r     : [0.05949370799862663, 0.060066635000112, 0.06006887100011227]
nlen_miraj    : [0.20019025200235774, 0.20034944599683513, 0.2058156430030067]

我正在努力通过运营获得一个小组

@timestamp                        datetime64[ns]
@version                                  object
dst                                       object
dst_port                                  object
host                                      object
http_req_header_contentlength             object
http_req_header_host                      object
http_req_header_referer                   object
http_req_header_useragent                 object
http_req_method                           object
http_req_secondleveldomain                object
http_req_url                              object
http_req_version                          object
http_resp_code                            object
http_resp_header_contentlength            object
http_resp_header_contenttype              object
http_user                                 object
local_time                                object
path                                      object
src                                       object
src_port                                  object
tags                                      object
type                                       int64
dtype: object

运行grouped_by_df.count()。compute()时出现以下错误:

grouped_by_df = df.groupby(['http_user', 'src'])['@timestamp'].agg(['min', 'max']).reset_index()

我正在使用dask版本0.15.1和Traceback (most recent call last): File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-62-9acb48b4ac67>", line 1, in <module> user_host_map.count().compute() File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/dask/base.py", line 98, in compute (result,) = compute(self, traverse=False, **kwargs) File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/dask/base.py", line 205, in compute results = get(dsk, keys, **kwargs) File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/distributed/client.py", line 1893, in get results = self.gather(packed) File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/distributed/client.py", line 1355, in gather direct=direct, local_worker=local_worker) File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/distributed/client.py", line 531, in sync return sync(self.loop, func, *args, **kwargs) File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/distributed/utils.py", line 234, in sync six.reraise(*error[0]) File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/distributed/utils.py", line 223, in f result[0] = yield make_coro() File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/tornado/gen.py", line 1055, in run value = future.result() File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/tornado/concurrent.py", line 238, in result raise_exc_info(self._exc_info) File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/tornado/gen.py", line 1063, in run yielded = self.gen.throw(*exc_info) File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/distributed/client.py", line 1235, in _gather traceback) File "/home/avlach/virtualenvs/dask/local/lib/python2.7/site-packages/distributed/protocol/pickle.py", line 59, in loads return pickle.loads(x) TypeError: itemgetter expected 1 arguments, got 0 客户端。可能导致这个问题的原因是什么?

1 个答案:

答案 0 :(得分:0)

我们只是遇到了类似的错误,我们正在运行某种形式的文件:

df[['col1','col2']].groupby('col1').agg("count")

并最终得到类似的错误:

    return pickle.loads(x)
TypeError: itemgetter expected 1 arguments, got 0

但是当我们重新格式化groupby时,其格式为:

df.groupby('col1')['col2'].count()

我们不再遇到该错误。我们现在已经重复了几次,这似乎并非偶然。完全不确定为什么会发生这种情况,但是如果有人在同一问题上挣扎,则值得尝试。