Spark 2.4-将数据帧写入S3存储桶

时间:2020-04-22 07:11:07

标签: scala apache-spark amazon-s3

在本地PC上,我尝试将DF加载到S3中。下面是我的代码段。

sparkContext.hadoopConfiguration.set("fs.s3a.awsAccessKeyId", Util.AWS_ACCESS_KEY)
sparkContext.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey", Util.AWS_SECRET_ACCESS_KEY)
sparkContext.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
  empTableDF.coalesce(1).write
  .format("csv")
  .option("header", "true")
  .mode(SaveMode.Overwrite)      
  .save("s3a://welpocstg/")

在运行时,我遇到了异常

com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain

我的pom.xml

<dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.7.7</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.7.7</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-aws</artifactId>
            <version>2.7.7</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient -->
        <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
            <version>4.5.6</version>
        </dependency>

1 个答案:

答案 0 :(得分:1)

您可以尝试以下更改吗?

from django.db import models
from django.utils import timezone
from django.contrib.auth.models import User
from PIL import Image
from django.urls import reverse

class Post(models.Model):
     title= models.CharField(max_length=100)
    img = models.ImageField(upload_to='pics')
    content = models.TextField()
    date_posted = models.DateTimeField(default=timezone.now)
    author= models.ForeignKey(User,on_delete=models.CASCADE)

   def __str__(self):
       return self.title

  def get_absolute_url(self):
       return reverse('User-Posts-Details', kwargs={'pk': self.pk})

 class Profile(models.Model):
user = models.OneToOneField(User, on_delete=models.CASCADE)
bio = models.TextField(max_length=300)
image = models.ImageField(default='default.jpg',upload_to='profile_pics')

def __str__(self):
    return f'{self.user.username} Profile'

def save(self):
    super().save()

    img = Image.open(self.image.path)

    if img.height > 300 or img.width > 300:
        output_size = (300,300)
        img.thumbnail(output_size)
        img.save(self.image.path)

 class Comments(models.Model):
     Post = models.ForeignKey(Post,on_delete=models.CASCADE,related_name='comments')
     user_id = models.ForeignKey(User,on_delete=models.CASCADE,default=True)
     comment = models.TextField()
     commented = models.DateTimeField(default=timezone.now)