我试图将spacy加载到SageMaker中。我在Jupyter笔记本实例中运行以下内容
// Read from datastore
PCollection<Entity> entities =
pipeline.apply("ReadFromDatastore",
DatastoreIO.v1().read().withProjectId(options.getProject())
.withQuery(query).withNamespace(options.getNamespace()));
// Apply processing to convert it to BigQuery TableRow
PCollection<TableRow> tableRow =
entities.apply("ConvertToTableRow", ParDo.of(new ProcessEntityFn()));
// Apply timestamp to TableRow element, and then apply windowing of one day on that
PCollection<TableRow> tableRowWindowTemp =
tableRow.apply("tableAddTimestamp", ParDo.of(new ApplyTimestampFn())).apply(
"tableApplyWindow",
Window.<TableRow> into(CalendarWindows.days(1).withTimeZone(
DateTimeZone.forID(options.getTimeZone()))));
//Apply reshuffle with random key for avoiding fusion
PCollection<TableRow> ismTableRowWindow =
tableRowWindow.apply("ReshuffleViaRandomKey",
Reshuffle.<TableRow> viaRandomKey());
// Write windowed output to BigQuery partitions
tableRowWindow.apply(
"WriteTableToBQ",
BigQueryIO
.writeTableRows()
.withSchema(BigqueryHelper.getSchema())
.to(TableRefPartition.perDay(options.getProject(),
options.getBigQueryDataset(), options.getTableName()))
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE));
我最终收到此错误
!pip install spacy
这也是
gcc: error trying to exec 'cc1plus': execvp: No such file or directory
error: command 'gcc' failed with exit status 1
如何使用Sagemaker解决此问题?
答案 0 :(得分:0)
来自https://stackoverflow.com/a/38733918/3276830
修复gcc
sudo apt-get update
sudo apt-get install --reinstall build-essential
我不确定第二个错误,可能murmurhash/mrmr.cpp
不存在?
答案 1 :(得分:0)
当我开始使用SageMaker时,我遇到了类似的问题所以我开发了这个开源项目https://github.com/Kenza-AI/sagify(sagify),这是一个CLI工具,可以帮助您在SageMaker上训练和部署自己的机器学习/深度学习模型以一种非常简单的方式。无论我使用什么库(Keras,Tensorflow,scikit-learn,LightFM,spacy等),我都设法训练和部署我的所有ML模型。基本上,您可以用经典的pythonic方式指定所有依赖项,即在requiments.txt中,sagify将读取它们并将它们安装在Docker镜像上。然后,可以在SageMaker上执行此Docker镜像以进行培训和部署。
答案 2 :(得分:0)
您可以尝试使用Jupyter单元选择Python3内核,按照以下命令安装spacy
!conda update --all -y
!conda install -n python3 -y -c conda-forge spacy
,然后重新启动内核。
重新启动内核后,您应该能够导入spacy。或者您可以使用Jupyter终端发出与上述相同的命令,只需删除!从上面发出命令时标记。