达斯克(Dask):TypeError:__call __()接受2个位置参数,但给出了3个位置

时间:2019-10-18 10:30:10

标签: dask

我在使用Dask时遇到此错误。我不知道如何解决它,因为在我的代码中似乎没有什么内容。

我正在做的是读取数据框并使用stanfordnlp标记文本列,然后取出nouns。仅使用pandas时效果很好,但是使用dask时出现此错误。我在Ubuntupython 3.7.3Dask 2.6.0上。

这是我的错误:

  

回溯(最近通话最近):   文件main.py,第56行,位于main(df = data,nlp = nlp,lang = lang,wanted_pos = wp)
  主df.persist()中的文件“ main.py”,第13行   文件“ /home/bertil/Envs/datascience/lib/python3.7/site-packages/dask/base.py”,第138行,以persist(结果)= persist(自身,遍历为False,** kwargs)   文件“ /home/bertil/Envs/datascience/lib/python3.7/site-packages/dask/base.py”,第629行,持久结果= schedule(dsk,keys,** kwargs)
  在获得** kwargs的文件“ /home/bertil/Envs/datascience/lib/python3.7/site-packages/dask/threaded.py”的第80行中
  在get_async raise_exception(exc,tb)中的第486行中的文件“ /home/bertil/Envs/datascience/lib/python3.7/site-packages/dask/local.py”
  在316行中增加文件“ /home/bertil/Envs/datascience/lib/python3.7/site-packages/dask/local.py”
  在execute_task结果= _execute_task(任务,数据)中的文件“ /home/bertil/Envs/datascience/lib/python3.7/site-packages/dask/local.py”,第222行
  文件“ /home/bertil/Envs/datascience/lib/python3.7/site-packages/dask/core.py”,第118行,位于_execute_task args2 = [_execute_task(a,cache)for a in args]
  args2中的文件“ /home/bertil/Envs/datascience/lib/python3.7/site-packages/dask/core.py”第118行= [_args中的_execute_task(a,cache)]
  _execute_task return func(* args2)中的文件“ /home/bertil/Envs/datascience/lib/python3.7/site-packages/dask/core.py”,行119
  TypeError:调用()接受2个位置参数,但给出了3个

这是我的代码:

#!/usr/bin/env python

from pathlib import Path
import dask.dataframe as dd
import stanfordnlp
import string


def main(df, nlp, lang, wanted_pos):
    df['tagged'] = df['Message'].apply(process,
                                       args=(nlp, lang, wanted_pos,),
                                       meta=('Message', 'object'))
    df.persist()
    df.to_csv(f'output.csv')


def process(text, nlp, lang, wanted_pos):
    # Remove punctuation
    text = text.translate(str.maketrans('', '', string.punctuation))
    token = nlp(text)
    words = {word for sent in token.sentences for word in sent.words}
    wanted_words = set(filter(lambda x: x in wanted_pos, words))
    wanted_words = ','.join(word for word in wanted_words if word)
    return wanted_words


if __name__ == '__main__':
    # Choose language
    lang = 'da'

    # Chose wanted_pos
    wp = ['NOUN']

    # Read data in chunks
    data = dd.read_csv('sample.csv', quoting=3, error_bad_lines=False,
                       dtype={'Message': 'object',
                              'Action Time': 'object',
                              'ClientQueues': 'object',
                              'Country': 'object',
                              'Custom Tags': 'object',
                              'Favorites': 'object',
                              'Geo Target': 'object',
                              'Location': 'object',
                              'State': 'object'})



    # Download model for nlp.
    stanford_path = Path.home() / 'stanfordnlp_resources' / f'{lang}_ddt_models'
    if not stanford_path.exists():
        stanfordnlp.download(lang)

    # Set up nlp pipeline
    nlp = stanfordnlp.Pipeline(processors='tokenize,lemma,pos', lang=lang)

    main(df=data, nlp=nlp, lang=lang, wanted_pos=wp)

更新:我以为我已修复它,但是还没有,所以我再次删除了答案。我仍然遇到问题

0 个答案:

没有答案