我们有一个EMR集群,并且已经创建(使用默认的SSE加密)。我们需要使用相同的集群来处理s3中的数据,这些数据已使用CSE加密。 我们可以动态地使用KMS解密EMR中的那些s3源文件(使用CSE加密)吗? 我们正在使用pyspark处理数据。
我们可以在启用CSE的情况下创建一个新的EMR群集,但是我们需要使用同一群集(具有SSE)来处理s3中启用CSE的数据。
sc=spark.sparkContext
sc._jsc.hadoopConfiguration().set("com.amazon.odinKms.defaultMaterialSetName",\
"Name of the material set")
sc._jsc.hadoopConfiguration().set("fs.s3.cse.encryptionMaterialsProvider.uri",\
"s3 path to jar for the provider")
sc._jsc.hadoopConfiguration().set("com.amazon.odinKms.keyRingBucketName","s3 bucket name")
sc._jsc.hadoopConfiguration().set("fs.s3.cse.encryptionMaterialsProvider",\
"com.amazon.s3.encryption.provider.EmrEncryptionMaterialsProvider")
sc._jsc.hadoopConfiguration().set("com.amazon.odinKms.kmsKeyRegion","us-east-1")
sc._jsc.hadoopConfiguration().set("com.amazon.odinKms.debugConfig","false")
sc._jsc.hadoopConfiguration().set("com.amazon.odinKms.kmsKeyId","XXXXXXXXXXXXXXXXXXXXXXXX")
sc._jsc.hadoopConfiguration().set("fs.s3.cse.enabled","true")
sc._jsc.hadoopConfiguration().set("com.amazon.odinKms.keyRingPrefix","keyring/")
rdd_mdb_table = sc.textFile(src_s3_path,40)
rdd_mdb_table.take(1)
由于解密失败,因此无法读取文件。属性值可以认为是准确的,因为我们在使用启用了CSE的配置创建的另一个EMR群集配置中使用了相同的属性。
实际结果:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/spark/python/pyspark/rdd.py", line 1313, in take
totalParts = self.getNumPartitions()
File "/usr/lib/spark/python/pyspark/rdd.py", line 385, in getNumPartitions
return self._jrdd.partitions().size()
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o201.partitions.
: com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.ProvisionException: Guice provision errors:
1) Error in custom provider, java.lang.IllegalArgumentException: EncryptionMaterialsProvider not found: com.amazon.s3.encryption.provider.EmrEncryptionMaterialsProvider
at com.amazon.ws.emr.hadoop.fs.guice.EmrFSBaseModule.provideAmazonS3Lite(EmrFSBaseModule.java:99)
at com.amazon.ws.emr.hadoop.fs.guice.EmrFSBaseModule.provideAmazonS3Lite(EmrFSBaseModule.java:99)
while locating com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3Lite
for field at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.s3(S3NativeFileSystem.java:65)
while locating com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem
while locating org.apache.hadoop.fs.FileSystem annotated with @com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.name.Named(value=s3n)
Caused by: java.lang.IllegalArgumentException: EncryptionMaterialsProvider not found: com.amazon.s3.encryption.provider.EmrEncryptionMaterialsProvider
at com.amazon.ws.emr.hadoop.fs.util.ConfigurationUtils.getEncryptionMaterialsProviderClass(ConfigurationUtils.java:308)
at com.amazon.ws.emr.hadoop.fs.util.ConfigurationUtils.getEncryptionMaterialsProvider(ConfigurationUtils.java:295)
...