AWS Glue TypeError:+:'DynamicFrame'和'str'不支持的操作数类型

时间:2018-07-09 19:57:15

标签: amazon-web-services aws-glue

我正在使用AWS-Glue构建ETL管道,并且在运行作业时遇到此错误:

“ TypeError:+:'DynamicFrame'和'str'不支持的操作数类型”

该作业正在处理数据,然后将其写到PostgreSQL数据库中。

从处理工作正常且正在更新PSQL数据库的角度看,该作业似乎运行良好,但是该作业每次运行时仍报告此错误。

我有点困惑,因为我基本上是在使用库存AWS作业脚本的修改版本。

这是我的代码:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.utils import getResolvedOptions
from awsglue.dynamicframe import DynamicFrame

import pyspark.sql.functions
from pyspark.sql.functions import to_date
from pyspark.sql.functions import input_file_name
from pyspark.sql.functions import current_timestamp

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

# Create a DynamicFrame using the Service ROs table
ros_DyF = glueContext.create_dynamic_frame.from_catalog(database="DB",        
table_name="TB", transformation_ctx = "ros_DyF")

# Do a bunch of processing...code not included...

# Update the tables in postgreSQL
psql_conn_options = {'database' : 'DB', 'dbtable' : 'TB'}
psql_tmp_dir = "TMPDIR"
datasink4 = glueContext.write_dynamic_frame.from_jdbc_conf(
frame = mapped_dyF,
catalog_connection = 'wizelyPSQL',
connection_options = psql_conn_options,
redshift_tmp_dir = psql_tmp_dir,
transformation_ctx = "datasink4")

job.commit()

这是我得到的错误:

Traceback (most recent call last):
File "script_2018-07-09-19-30-30.py", line 168, in <module>
transformation_ctx = "datasink4")
File "/mnt/yarn/usercache/root/appcache/application_1531164400757_0001/container_1531164400757_0001_01_000001/PyGlue.zip/awsglue/dynamicframe.py", line 597, in from_jdbc_conf
File "/mnt/yarn/usercache/root/appcache/application_1531164400757_0001/container_1531164400757_0001_01_000001/PyGlue.zip/awsglue/context.py", line 262, in write_dynamic_frame_from_jdbc_conf
File "/mnt/yarn/usercache/root/appcache/application_1531164400757_0001/container_1531164400757_0001_01_000001/PyGlue.zip/awsglue/context.py", line 278, in     write_from_jdbc_conf
File "/mnt/yarn/usercache/root/appcache/application_1531164400757_0001/container_1531164400757_0001_01_000001/PyGlue.zip/awsglue/data_sink.py", line 32, in write
File "/mnt/yarn/usercache/root/appcache/application_1531164400757_0001/container_1531164400757_0001_01_000001/PyGlue.zip/awsglue/data_sink.py", line 28, in     writeFrame
TypeError: unsupported operand type(s) for +: 'DynamicFrame' and 'str'
End of LogType:stdout

有什么建议吗?

1 个答案:

答案 0 :(得分:0)

似乎是PyGlue lib中的错误。但是我检查了源代码,并且没有任何可疑之处。这是一行:

28.     return DynamicFrame(self._jsink.pyWriteDynamicFrame(dynamic_frame._jdf, callsite(), info), dynamic_frame.glue_ctx, dynamic_frame.name + "_errors")

如果该行是这样(从最后一个参数中删除.name),则会产生您收到的错误:

28.     return DynamicFrame(self._jsink.pyWriteDynamicFrame(dynamic_frame._jdf, callsite(), info), dynamic_frame.glue_ctx, dynamic_frame + "_errors")

在这种情况下,由于对self._jsink.pyWriteDynamicFrame(...)的求值发生在产生错误的字符串连接之前,因此您的工作将正常进行。

如果您正在dev endpoint中使用PySpark lib,请尝试从aws-glue-jes-prod-us-east-1-assets/etl/python/PyGlue.zip下载最新版本。否则,如果要在Glue控制台UI中编写脚本(AWS Glue服务提供了lib),则应联系AWS支持。