如何在数组i Parquet(Java)上应用过滤谓词

时间:2018-09-07 05:14:11

标签: java avro parquet

例如:我有如下的avsc文件。

[{      “ type”:“记录”,      “ namespace”:“ com.example”,      “ name”:“客户”,      “字段”:[        {“ name”:“ first_name”,“ type”:“ string”,“ doc”:“客户的名字”},        {“ name”:“ last_name”,“ type”:“ string”,“ doc”:“客户姓氏”},        {“ name”:“ age”,“ type”:“ int”,“ doc”:“注册时的年龄”},        {“名称”:“高度”,“类型”:“浮点数”,“文档”:“注册时的高度,以厘米为单位”},        {“名称”:“重量”,“类型”:“浮动”,“ doc”:“注册时的重量,以千克为单位”},        {“ name”:“ automated_email”,“ type”:“ boolean”,“ default”:true,“ doc”:“该字段指示用户是否注册了营销电子邮件”}      ] }

{      “ type”:“记录”,      “ namespace”:“ com.example”,      “ name”:“客户”,      “字段”:[           {“名称”:“客户”,“类型”:{“类型”:“数组”,“项目”:“ com.example.Customer”},“ doc”:“注册时的年龄”}      ] }]

我有几个客户,并已添加到客户

    Customer.Builder customerBuilder1 = Customer.newBuilder();
    customerBuilder1.setAge(30);
    customerBuilder1.setFirstName("Mark");
    customerBuilder1.setLastName("Simpson");
    customerBuilder1.setAutomatedEmail(true);
    customerBuilder1.setHeight(180f);
    customerBuilder1.setWeight(90f);

    Customer.Builder customerBuilder2 = Customer.newBuilder();
    customerBuilder2.setAge(30);
    customerBuilder2.setFirstName("Vishant");
    customerBuilder2.setLastName("Shah");
    customerBuilder2.setAutomatedEmail(true);
    customerBuilder2.setHeight(181f);
    customerBuilder2.setWeight(65f);

    Customer customer1 = customerBuilder1.build();
    System.out.println("Original : " +customer1.toString());

    Customer customer2 = customerBuilder2.build();
    System.out.println("Original : " + customer2.toString());

    Customers.Builder customersBuilder = Customers.newBuilder();
    customersBuilder.setCustomers(Arrays.asList(customer1, customer2));

    Customers customers = customersBuilder.build();

    //Write parquet file
    try (ParquetWriter<Customers> writer = AvroParquetWriter
            .<Customers>builder(new Path("customers-specific.parquet"))
            .withSchema(customers.getSchema())
            .withConf(new Configuration())
            .withCompressionCodec(CompressionCodecName.SNAPPY)
            .build()) {
        writer.write(customers);
    }

我如何将谓词应用于阵列上“客户的名字”列表中。没有复杂的对象,它本来很简单,但不适用于该数组。

    FilterPredicate predicate = eq(binaryColumn("first_name"), Binary.fromString("Vishant"));
    try (ParquetReader<Customer> selectiveReader = AvroParquetReader.<Customers>builder(new Path("customer-specific.parquet"))
            .withFilter(FilterCompat.get(predicate))
            .build()) {
        Customer selectedCustomer;
        while ((selectedCustomer = selectiveReader.read()) != null) {
            System.out.println("Selected Read" + selectedCustomer.toString());
        }

    }

0 个答案:

没有答案