使用Java中的自定义端点从S3读取镶木地板文件

时间:2020-02-23 20:21:34

标签: java amazon-s3 parquet endpoint

我正在尝试找出从S3存储中读取镶木地板数据的最佳方法。 第一种方法

BasicSessionCredentials cred = new BasicSessionCredentials(key,secret, "");
        AmazonS3 client = AmazonS3ClientBuilder
                .standard()
                .withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration("custom_endpoint", region))
                .withCredentials(new AWSStaticCredentialsProvider(cred))
                .build();

        GetObjectRequest req = new GetObjectRequest("bucket_name", "relative_path", "");
        S3Object obj = client.getObject(req);
        S3ObjectInputStream cont = obj.getObjectContent();

这样我可以读取对象,但是找不到从InputStream读取镶木地板数据的方法

第二种方法

String SCHEMA_TEMPLATE = "{" +
                        "\"type\": \"record\",\n" +
                        "    \"name\": \"schema\",\n" +
                        "    \"fields\": [\n" +
                        "        {\"name\": \"timeStamp\", \"type\": \"string\"},\n" +
                        "        {\"name\": \"temperature\", \"type\": \"double\"},\n" +
                        "        {\"name\": \"pressure\", \"type\": \"double\"}\n" +
                        "    ]" +
                        "}";
String PATH_SCHEMA = "s3a";
Path internalPath = new Path(PATH_SCHEMA, bucketName, folderName);
Schema schema = new Schema.Parser().parse(SCHEMA_TEMPLATE);
Configuration configuration = new Configuration();
configuration.set("fs.s3a.access.key", "key");
configuration.set("fs.s3a.secret.key", "secret");
configuration.set("fs.s3a.endpoint", "custom_endpoint");
AvroReadSupport.setRequestedProjection(configuration, schema);
ParquetReader<GenericRecord> = AvroParquetReader.GenericRecord>builder(internalPath).withConf(configuration).build();
GenericRecord genericRecord = parquetReader.read();

while(genericRecord != null) {
        Map<String, String> valuesMap = new HashMap<>();
        genericRecord.getSchema().getFields().forEach(field -> valuesMap.put(field.name(), genericRecord.get(field.name()).toString()));

        genericRecord = parquetReader.read();
}

但是对于第二种情况,我无法读取数据并获取 SocketTimeoutException 。 帮助我找到正确的方法 谢谢

0 个答案:

没有答案