Question

我们有一些代码可以将一堆S3文件下载到本地目录。要检索的文件列表来自我们运行的查询。它仅列出我们的S3存储桶中实际存在的文件。

当我们循环检索这些文件时，大约10％的文件会返回404错误，就像文件不存在一样。我注销了该文件的名称/位置，所以我可以去S3并检查，确定我们去的地方的IS ON S3中的每一个都在寻找它。

为什么S3在文件存在时会抛出404？

这是脚本的Groovy代码。

class RetrieveS3FilesFromCSVLoader implements Loader {

private static String missingFilesFile = "00-MISSED_FILES.csv"
private static String csvFileName = "/csv/s3file2.csv"
private static String saveFilesToLocation = "/tmp/retrieve/"

public static final char SEPARATOR = ','

@Autowired
DocumentFileService documentFileService

private void readWithCommaSeparatorSQL() {

    int counter = 0
    String fileName
    String fileLocation
    File missedFiles = new File(saveFilesToLocation + missingFilesFile)
    PrintWriter writer = new PrintWriter(missedFiles)
    File fileCSV = new File(getClass().getResource(csvFileName).toURI())

    fileCSV.splitEachLine(SEPARATOR as String) { nextLine ->
        //if (counter < 15) {
            if (nextLine != null && (nextLine[0] != 'FileLocation')) {
                counter++
                try {
                    //Remove 0, only if client number start with "0".
                    fileLocation = nextLine[0].trim()

                    byte[] fileBytes = documentFileService.getFile(fileLocation)
                    if (fileBytes != null) {
                        fileName = fileLocation.substring(fileLocation.indexOf("/") + 1, fileLocation.length())
                        File file = new File(saveFilesToLocation + fileName)
                        file.withOutputStream {
                            it.write fileBytes
                        }
                        println "$counter) Wrote file ${fileLocation} to ${saveFilesToLocation + fileLocation}"
                    } else {
                        println "$counter) UNABLE TO RETRIEVE FILE ELSE: $fileLocation"
                        writer.println(fileLocation)
                    }

                } catch (Exception e) {
                    println "$counter) UNABLE TO RETRIEVE FILE: $fileLocation"
                    println(e.getMessage())
                    writer.println(fileLocation)
                }
            } else {
                counter++;
            }
        //}
    }
    writer.close()
}

以下是getFile（fileLocation）和客户端创建的代码。

public byte[] getFile(String filename) throws IOException {
    AmazonS3Client s3Client = connectToAmazonS3Service();
    S3Object object = s3Client.getObject(S3_BUCKET_NAME, filename);
    if(object == null) {
        return null;
    }
    byte[] fileAsArray = IOUtils.toByteArray(object.getObjectContent());
    object.close();

    return fileAsArray;
}

/**
 * Connects to Amazon S3
 *
 * @return instance of AmazonS3Client
 */
private AmazonS3Client connectToAmazonS3Service() {
    AWSCredentials credentials;
    try {
        credentials = new BasicAWSCredentials(S3_ACCESS_KEY_ID, S3_SECRET_ACCESS_KEY);
    } catch (Exception e) {
        throw new AmazonClientException(
                "Cannot load the credentials from the credential profiles file. " +
                        "Please make sure that your credentials file is at the correct " +
                        "location (~/.aws/credentials), and is in valid format.",
                e);
    }

    AmazonS3Client s3 = new AmazonS3Client(credentials);
    Region usWest2 = Region.getRegion(Regions.US_EAST_1);
    s3.setRegion(usWest2);

    return s3;
}

上面的代码适用于传递给脚本的列表中90％的文件，但事实上我们知道所有100％的文件都存在于S3中，而且我们传递的是位置字符串。

Answer 1

我只是个白痴。认为它在属性文件中具有生产AWS凭证。相反，它是开发凭据。所以我的证书不对。

AWS S3为一个肯定仍然存在的文件返回404

1 个答案: