无法在AWS Sagemaker上安装spacy

时间:2018-04-05 06:50:42

标签: linux python-3.x amazon-web-services spacy amazon-sagemaker

我试图将spacy加载到SageMaker中。我在Jupyter笔记本实例中运行以下内容

// Read from datastore
PCollection<Entity> entities =
        pipeline.apply("ReadFromDatastore",
                DatastoreIO.v1().read().withProjectId(options.getProject())
                        .withQuery(query).withNamespace(options.getNamespace()));

// Apply processing to convert it to BigQuery TableRow
PCollection<TableRow> tableRow =
        entities.apply("ConvertToTableRow", ParDo.of(new ProcessEntityFn()));

// Apply timestamp to TableRow element, and then apply windowing of one day on that
PCollection<TableRow> tableRowWindowTemp =
        tableRow.apply("tableAddTimestamp", ParDo.of(new ApplyTimestampFn())).apply(
                "tableApplyWindow",
                Window.<TableRow> into(CalendarWindows.days(1).withTimeZone(
                        DateTimeZone.forID(options.getTimeZone()))));

//Apply reshuffle with random key for avoiding fusion
PCollection<TableRow> ismTableRowWindow =
        tableRowWindow.apply("ReshuffleViaRandomKey",
                Reshuffle.<TableRow> viaRandomKey());

// Write windowed output to BigQuery partitions
tableRowWindow.apply(
        "WriteTableToBQ",
        BigQueryIO
                .writeTableRows()
                .withSchema(BigqueryHelper.getSchema())
                .to(TableRefPartition.perDay(options.getProject(),
                        options.getBigQueryDataset(), options.getTableName()))
                .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE));

我最终收到此错误

!pip install spacy

这也是

  gcc: error trying to exec 'cc1plus': execvp: No such file or directory
  error: command 'gcc' failed with exit status 1

如何使用Sagemaker解决此问题?

3 个答案:

答案 0 :(得分:0)

来自https://stackoverflow.com/a/38733918/3276830

修复gcc

sudo apt-get update
sudo apt-get install --reinstall build-essential

我不确定第二个错误,可能murmurhash/mrmr.cpp不存在?

答案 1 :(得分:0)

当我开始使用SageMaker时,我遇到了类似的问题所以我开发了这个开源项目https://github.com/Kenza-AI/sagify(sagify),这是一个CLI工具,可以帮助您在SageMaker上训练和部署自己的机器学习/深度学习模型以一种非常简单的方式。无论我使用什么库(Keras,Tensorflow,scikit-learn,LightFM,spacy等),我都设法训练和部署我的所有ML模型。基本上,您可以用经典的pythonic方式指定所有依赖项,即在requiments.txt中,sagify将读取它们并将它们安装在Docker镜像上。然后,可以在SageMaker上执行此Docker镜像以进行培训和部署。

答案 2 :(得分:0)

您可以尝试使用Jupyter单元选择Python3内核,按照以下命令安装spacy

!conda update --all -y 
!conda install -n python3 -y -c conda-forge spacy

,然后重新启动内核。

重新启动内核后,您应该能够导入spacy。或者您可以使用Jupyter终端发出与上述相同的命令,只需删除!从上面发出命令时标记。