Question

我正在尝试通过AWS胶运行的python脚本从S3存储桶中读取配置文件。但是文件没有被读取。

from __future__ import print_function
from pyspark import SparkContext
from awsglue.utils import getResolvedOptions
import sys
import ConfigParser
from os import path
sc = SparkContext.getOrCreate()
print("My New Job")

args = getResolvedOptions(sys.argv, ['config_path', 'section_type'])
config_path = args["config_path"]
print("config_path")
print(config_path)

config = ConfigParser.ConfigParser()
filenameread = config.read(['s3://temp-bucket/config/myconfig.ini','s3://temp-bucket/config/myconfig.config','s3://temp-bucket/config/myconfig.txt'])
print("filenameread")
print(filenameread)
print("sections")
print(config.sections())

日志显示如下。配置部分为空。

{
  "timestamp": 1551705899133,
  "message": "Container: container_somenumber on ip-somenumber.ec2.internal_somenumber\nLogType:stdout\nLog Upload Time:Mon Mar 04 13:24:51 +0000 2019\nLogLength:175\nLog Contents:\nMy New Job\nconfig_path\ns3://temp-bucket/config/myconfig.ini\nfilenameread\n[]\nsections\n[]\nEnd of LogType:stdout\n",
   "ingestionTime": 1551705899785
 },

任何人都可以帮忙。使用Python 2.7.15。

Answer 1

这可以通过在作业参数部分定义--extra-files键来实现。

由于这个原因，在执行过程中，此键的值中提到的文件路径（在多个文件的情况下用逗号分隔）被带到运行时环境。然后，使用configparser，可以像常规配置文件一样轻松访问该文件。

有关更多详细信息，请参阅https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html

AWS Glue中的Configparser无法从S3读取配置文件

1 个答案: