我正在尝试使用python脚本在AWS gel中运行ETL作业
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
glueContext = GlueContext(SparkContext.getOrCreate())
person = glueContext.create_dynamic_frame.from_catalog(
database="test",
table_name="testetl_person")
person.printSchema()
此脚本在AWS开发终端节点中运行,并且在运行作业时会引发以下异常
File "/tmp/runscript.py", line 118, in <module>
runpy.run_path(temp_file_path, run_name='__main__')
File "/usr/local/lib/python3.6/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/usr/local/lib/python3.6/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/tmp/glue-python-scripts-cf4xyag5/test.py", line 2, in <module>
ModuleNotFoundError: No module named 'awsglue.transforms'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/runscript.py", line 137, in <module>
raise e_type(e_value).with_tracsback(new_stack)
AttributeError: 'ModuleNotFoundError' object has no attribute 'with_tracsback'
有人可以帮我吗? 如果您需要更多信息,请告诉我。
答案 0 :(得分:1)
根据This,您用于运行代码的语言应设置为spark,而不是python。
答案 1 :(得分:0)
如果您在sagemaker笔记本上使用Glue开发端点运行spark,则可能是权限问题,如该AWS论坛thread
中突出显示Glue开发端点需要以下IAM策略,才能从AWS提供的S3存储桶中下载所需的awsglue library。
arn:aws:iam::aws:policy/service-role/AWSGlueServiceNotebookRole
答案 2 :(得分:0)
您可能选择了 python
笔记本而不是 pyspark
笔记本。
您必须选择一个 pyspark
笔记本。
答案 3 :(得分:0)
不是 100% 的
一种可能的解决方案是: