我试图定义输出模式,它应该是包含另外两个元组的元组,即stats:tuple(c:tuple(),d:tuple)
。
以下代码无法按预期工作。它以某种方式产生结构:
stats:tuple(b:tuple(c:tuple(),d:tuple()))
以下是describe产生的输出。
sourceData: {com.mortardata.pig.dataspliter_36: (stats: ((name: chararray,customerId: chararray,VIN: chararray,birth_date: chararray,fuel_mileage: chararray,fuel_consumption: chararray),(name: chararray,customerId: chararray,VIN: chararray,birth_date: chararray,fuel_mileage: chararray,fuel_consumption: chararray)))}
是否可以创建如下结构,这意味着我需要从前面的示例中删除元组b。
grunt> describe sourceData;
sourceData: {t: (s: (name: chararray,customerId: chararray,VIN: chararray,birth_date: chararray,fuel_mileage: chararray,fuel_consumption: chararray),n: (name: chararray,customerId: chararray,VIN: chararray,birth_date: chararray,fuel_mileage: chararray,fuel_consumption: chararray))}
以下代码无法按预期工作。
public Schema outputSchema(Schema input) {
Schema sensTuple = new Schema();
sensTuple.add(new Schema.FieldSchema("name", DataType.CHARARRAY));
sensTuple.add(new Schema.FieldSchema("customerId", DataType.CHARARRAY));
sensTuple.add(new Schema.FieldSchema("VIN", DataType.CHARARRAY));
sensTuple.add(new Schema.FieldSchema("birth_date", DataType.CHARARRAY));
sensTuple.add(new Schema.FieldSchema("fuel_mileage", DataType.CHARARRAY));
sensTuple.add(new Schema.FieldSchema("fuel_consumption", DataType.CHARARRAY));
Schema nonSensTuple = new Schema();
nonSensTuple.add(new Schema.FieldSchema("name", DataType.CHARARRAY));
nonSensTuple.add(new Schema.FieldSchema("customerId", DataType.CHARARRAY));
nonSensTuple.add(new Schema.FieldSchema("VIN", DataType.CHARARRAY));
nonSensTuple.add(new Schema.FieldSchema("birth_date", DataType.CHARARRAY));
nonSensTuple.add(new Schema.FieldSchema("fuel_mileage", DataType.CHARARRAY));
nonSensTuple.add(new Schema.FieldSchema("fuel_consumption", DataType.CHARARRAY));
Schema parentTuple = new Schema();
parentTuple.add(new Schema.FieldSchema(null, sensTuple, DataType.TUPLE));
parentTuple.add(new Schema.FieldSchema(null, nonSensTuple, DataType.TUPLE));
Schema outputSchema = new Schema();
outputSchema.add(new Schema.FieldSchema("stats", parentTuple, DataType.TUPLE));
return new Schema(new Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), input),
outputSchema, DataType.TUPLE));
UDF的exec方法返回:
public Tuple exec(Tuple tuple) throws IOException {
Tuple parentTuple = mTupleFactory.newTuple();
parentTuple.append(tuple1);
parentTuple.append(tuple2);
谢谢