我们有一些代码可以将一堆S3文件下载到本地目录。要检索的文件列表来自我们运行的查询。它仅列出我们的S3存储桶中实际存在的文件。
当我们循环检索这些文件时,大约10%的文件会返回404错误,就像文件不存在一样。我注销了该文件的名称/位置,所以我可以去S3并检查,确定我们去的地方的IS ON S3中的每一个都在寻找它。
为什么S3在文件存在时会抛出404?
这是脚本的Groovy代码。
class RetrieveS3FilesFromCSVLoader implements Loader {
private static String missingFilesFile = "00-MISSED_FILES.csv"
private static String csvFileName = "/csv/s3file2.csv"
private static String saveFilesToLocation = "/tmp/retrieve/"
public static final char SEPARATOR = ','
@Autowired
DocumentFileService documentFileService
private void readWithCommaSeparatorSQL() {
int counter = 0
String fileName
String fileLocation
File missedFiles = new File(saveFilesToLocation + missingFilesFile)
PrintWriter writer = new PrintWriter(missedFiles)
File fileCSV = new File(getClass().getResource(csvFileName).toURI())
fileCSV.splitEachLine(SEPARATOR as String) { nextLine ->
//if (counter < 15) {
if (nextLine != null && (nextLine[0] != 'FileLocation')) {
counter++
try {
//Remove 0, only if client number start with "0".
fileLocation = nextLine[0].trim()
byte[] fileBytes = documentFileService.getFile(fileLocation)
if (fileBytes != null) {
fileName = fileLocation.substring(fileLocation.indexOf("/") + 1, fileLocation.length())
File file = new File(saveFilesToLocation + fileName)
file.withOutputStream {
it.write fileBytes
}
println "$counter) Wrote file ${fileLocation} to ${saveFilesToLocation + fileLocation}"
} else {
println "$counter) UNABLE TO RETRIEVE FILE ELSE: $fileLocation"
writer.println(fileLocation)
}
} catch (Exception e) {
println "$counter) UNABLE TO RETRIEVE FILE: $fileLocation"
println(e.getMessage())
writer.println(fileLocation)
}
} else {
counter++;
}
//}
}
writer.close()
}
以下是getFile(fileLocation)和客户端创建的代码。
public byte[] getFile(String filename) throws IOException {
AmazonS3Client s3Client = connectToAmazonS3Service();
S3Object object = s3Client.getObject(S3_BUCKET_NAME, filename);
if(object == null) {
return null;
}
byte[] fileAsArray = IOUtils.toByteArray(object.getObjectContent());
object.close();
return fileAsArray;
}
/**
* Connects to Amazon S3
*
* @return instance of AmazonS3Client
*/
private AmazonS3Client connectToAmazonS3Service() {
AWSCredentials credentials;
try {
credentials = new BasicAWSCredentials(S3_ACCESS_KEY_ID, S3_SECRET_ACCESS_KEY);
} catch (Exception e) {
throw new AmazonClientException(
"Cannot load the credentials from the credential profiles file. " +
"Please make sure that your credentials file is at the correct " +
"location (~/.aws/credentials), and is in valid format.",
e);
}
AmazonS3Client s3 = new AmazonS3Client(credentials);
Region usWest2 = Region.getRegion(Regions.US_EAST_1);
s3.setRegion(usWest2);
return s3;
}
上面的代码适用于传递给脚本的列表中90%的文件,但事实上我们知道所有100%的文件都存在于S3中,而且我们传递的是位置字符串。
答案 0 :(得分:0)
我只是个白痴。认为它在属性文件中具有生产AWS凭证。相反,它是开发凭据。所以我的证书不对。