我正在使用AWS-Glue构建ETL管道,并且在运行作业时遇到此错误:
“ TypeError:+:'DynamicFrame'和'str'不支持的操作数类型”
该作业正在处理数据,然后将其写到PostgreSQL数据库中。
从处理工作正常且正在更新PSQL数据库的角度看,该作业似乎运行良好,但是该作业每次运行时仍报告此错误。
我有点困惑,因为我基本上是在使用库存AWS作业脚本的修改版本。
这是我的代码:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.utils import getResolvedOptions
from awsglue.dynamicframe import DynamicFrame
import pyspark.sql.functions
from pyspark.sql.functions import to_date
from pyspark.sql.functions import input_file_name
from pyspark.sql.functions import current_timestamp
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
# Create a DynamicFrame using the Service ROs table
ros_DyF = glueContext.create_dynamic_frame.from_catalog(database="DB",
table_name="TB", transformation_ctx = "ros_DyF")
# Do a bunch of processing...code not included...
# Update the tables in postgreSQL
psql_conn_options = {'database' : 'DB', 'dbtable' : 'TB'}
psql_tmp_dir = "TMPDIR"
datasink4 = glueContext.write_dynamic_frame.from_jdbc_conf(
frame = mapped_dyF,
catalog_connection = 'wizelyPSQL',
connection_options = psql_conn_options,
redshift_tmp_dir = psql_tmp_dir,
transformation_ctx = "datasink4")
job.commit()
这是我得到的错误:
Traceback (most recent call last):
File "script_2018-07-09-19-30-30.py", line 168, in <module>
transformation_ctx = "datasink4")
File "/mnt/yarn/usercache/root/appcache/application_1531164400757_0001/container_1531164400757_0001_01_000001/PyGlue.zip/awsglue/dynamicframe.py", line 597, in from_jdbc_conf
File "/mnt/yarn/usercache/root/appcache/application_1531164400757_0001/container_1531164400757_0001_01_000001/PyGlue.zip/awsglue/context.py", line 262, in write_dynamic_frame_from_jdbc_conf
File "/mnt/yarn/usercache/root/appcache/application_1531164400757_0001/container_1531164400757_0001_01_000001/PyGlue.zip/awsglue/context.py", line 278, in write_from_jdbc_conf
File "/mnt/yarn/usercache/root/appcache/application_1531164400757_0001/container_1531164400757_0001_01_000001/PyGlue.zip/awsglue/data_sink.py", line 32, in write
File "/mnt/yarn/usercache/root/appcache/application_1531164400757_0001/container_1531164400757_0001_01_000001/PyGlue.zip/awsglue/data_sink.py", line 28, in writeFrame
TypeError: unsupported operand type(s) for +: 'DynamicFrame' and 'str'
End of LogType:stdout
有什么建议吗?
答案 0 :(得分:0)
似乎是PyGlue lib中的错误。但是我检查了源代码,并且没有任何可疑之处。这是一行:
28. return DynamicFrame(self._jsink.pyWriteDynamicFrame(dynamic_frame._jdf, callsite(), info), dynamic_frame.glue_ctx, dynamic_frame.name + "_errors")
如果该行是这样(从最后一个参数中删除.name
),则会产生您收到的错误:
28. return DynamicFrame(self._jsink.pyWriteDynamicFrame(dynamic_frame._jdf, callsite(), info), dynamic_frame.glue_ctx, dynamic_frame + "_errors")
在这种情况下,由于对self._jsink.pyWriteDynamicFrame(...)
的求值发生在产生错误的字符串连接之前,因此您的工作将正常进行。
如果您正在dev endpoint中使用PySpark lib,请尝试从aws-glue-jes-prod-us-east-1-assets/etl/python/PyGlue.zip
下载最新版本。否则,如果要在Glue控制台UI中编写脚本(AWS Glue服务提供了lib),则应联系AWS支持。