Python Dask AttributeError:“模块”对象没有属性“ broadcast_to”

时间:2018-11-01 03:04:47

标签: python dask attributeerror

我已经编写了代码来试用Dask,以便在Unix服务器上利用多个处理器,如下所示:

import pandas as pd
import sys
import dask.dataframe as dd
from dask.multiprocessing import get


numbers = pd.read_csv("head_5_22SNPs_CMI.txt", sep="\t", header=None)

combinations = pd.read_csv("all_combinations_5snps.txt", sep=" ", header=None)

data_dask = dd.from_pandas(combinations, npartitions=5)

pop = int(1 + 5)

score_col, freq_col = [], []

def score_freq(line):
    score=0
    freq=1
    for j in range(len(line)):
        if line[j][1] != numbers.values[j][1]:   # homozygous for ref
            score+=0
            freq*=(float(1-float(numbers.values[j][pop]))*float(1-float(numbers.values[j][pop])))
        elif line[j][0] != numbers.values[j][1] and line[j][1] == numbers.values[j][1]: # heterozygous
            score+=(float(numbers.values[j][5]))
            freq*=(2*(float(1-float(numbers.values[j][pop]))*float(numbers.values[j][pop])))
        elif line[j][0] == numbers.values[j][1]:
            score+=2*(float(numbers.values[j][5]))
            freq*=(float(numbers.values[j][pop])*float(numbers.values[j][pop]))

        if freq < 1e-5:   # threshold to stop loop in interest of efficiency 
            break


    return pd.Series([score, freq])

res = data_dask.map_partitions(lambda df: df.apply((lambda row: score_freq(row)), axis=1)).compute(scheduler=get)

res.to_csv('dask_test.txt', index=False)

在我的Unix服务器上运行此代码时出现错误:

Traceback (most recent call last):
  File "compute_scores_pandas+dask_testing.py", line 3, in <module>
    import dask.dataframe as dd
  File "/hpc/home/lsiwzyj/anaconda/lib/python2.7/site-packages/dask/dataframe/__init__.py", line 4, in <module>
    from .core import (DataFrame, Series, Index, _Frame, map_partitions,
  File "/hpc/home/lsiwzyj/anaconda/lib/python2.7/site-packages/dask/dataframe/core.py", line 19, in <module>
    from .. import array as da
  File "/hpc/home/lsiwzyj/anaconda/lib/python2.7/site-packages/dask/array/__init__.py", line 5, in <module>
    from .core import (Array, block, concatenate, stack, from_array, store,
  File "/hpc/home/lsiwzyj/anaconda/lib/python2.7/site-packages/dask/array/core.py", line 31, in <module>
    from . import chunk
  File "/hpc/home/lsiwzyj/anaconda/lib/python2.7/site-packages/dask/array/chunk.py", line 19, in <module>
    broadcast_to = npcompat.broadcast_to
AttributeError: 'module' object has no attribute 'broadcast_to'

经过一番谷歌搜索之后,似乎这个问题可能是类名之间的冲突,但是我似乎无法在脚本中找出任何问题。我也曾尝试升级Dask软件包,但确实收到警告:

Cannot uninstall 'python-dateutil'. It is a distutils installed project anthus we cannot accurately determine which files belong to it which would ld to only a partial uninstall.

有人知道问题出在哪里吗?该脚本在我的IDE中的Windows上运行。

2 个答案:

答案 0 :(得分:1)

问题是两个库中的名称冲突: This尝试从名称broadcast_to导入npcompat,名称numpy被两个不同的定义herepossibly here所迷惑,尽管后者是一个猜测。

@mdurant在上面的评论中引用了here中讨论的dask中的错误。

您似乎没有安装conda remove,除非有充分理由不这样做,否则安装这将是我的解决方案。

在阅读您的评论时,我不太清楚原因(因此在这里没有那么有用),因此必须进一步研究,但我首先要摆脱该警告。您可以通过使用setInterval命令删除python-dateutil来实现。然后,再次升级dask,将再次安装(具有最新版本),并且不应显示该警告。

答案 1 :(得分:0)

因此,当为填充了用dask装饰的功能的模块创建了requirements.txt文件时,我也发现了这个问题。由于轻快的后端代码似乎有错误,因此我使用以下方法找到了解决方法:

import dask.delayed as delayed

@delayed
def some_fun(x):
     return x

代替

import dask
@dask.delayed
def some_fun(x):
     return x

古怪,但是它可以工作,并且对于放置它的各种环境都更加健壮。