我有下面的代码,其中我正在阅读csv文件并定义其架构,之后我将其转换为BeamRecords。然后应用BeamSql实现PTransforms。
代码:
class Clo {
public String Outlet;
public String CatLib;
public String ProdKey;
public Date Week;
public String SalesComponent;
public String DuetoValue;
public String PrimaryCausalKey;
public Float CausalValue;
public Integer ModelIteration;
public Integer Published;
}
public static void main(String[] args) {
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline p = Pipeline.create(options);
PCollection<java.lang.String> lines= p.apply(TextIO.read().from("gs://gcpbucket/input/WeeklyDueto.csv"));
PCollection<Clorox> pojos = lines.apply(ParDo.of(new ExtractObjectsFn()));
List<java.lang.String> fieldNames = Arrays.asList("Outlet", "CatLib", "ProdKey", "Week", "SalesComponent", "DuetoValue", "PrimaryCausalKey", "CausalValue", "ModelIteration", "Published");
List<java.lang.Integer> fieldTypes = Arrays.asList(Types.VARCHAR, Types.VARCHAR, Types.VARCHAR, Types.DATE, Types.VARCHAR,Types.VARCHAR,Types.VARCHAR, Types.FLOAT, Types.INTEGER, Types.INTEGER);
BeamRecordSqlType appType = BeamRecordSqlType.create(fieldNames, fieldTypes);
PCollection<BeamRecord> apps = pojos.apply(
ParDo.of(new DoFn<Clo, BeamRecord>() {
@ProcessElement
public void processElement(ProcessContext c) {
BeamRecord br = new BeamRecord(
appType,
c.element().Outlet,
c.element().CatLib,
c.element().ProdKey,
c.element().Week,
c.element().SalesComponent,
c.element().DuetoValue,
c.element().PrimaryCausalKey,
c.element().CausalValue,
c.element().ModelIteration,
c.element().Published
);
c.output(br);
}
})).setCoder(appType, getRecordCoder());
PCollection<BeamRecord> out = apps.apply(BeamSql.query("select Outlet from PCOLLECTION"));
out.apply("WriteMyFile", TextIO.write().to("gs://gcpbucket/output/sbc.txt"));
}
我的问题是:
我已将ExtractObjectsFn()实现为:
public void processElement(ProcessContext c) {
ArrayList<Clo> clx = new ArrayList<Clo>();
java.lang.String[] strArr = c.element().split("\n");
for(int i = 0; i < strArr.length; i++) {
Clo clo = new Clo();
java.lang.String[] temp = strArr[i].split(",");
clo.setCatLib(temp[1]);
clo.setCausalValue(temp[7]);
clo.setDuetoValue(temp[5]);
clo.setModelIteration(temp[8]);
clo.setOutlet(temp[0]);
clo.setPrimaryCausalKey(temp[6]);
clo.setProdKey(temp[2]);
clo.setPublished(temp[9]);
clo.setSalesComponent(temp[4]);
clo.setWeek(temp[3]);
c.output(clo);
clx.add(clo);
}
}
让我知道它是否正确完成,因为在执行代码并获得错误No Coder has been manually specified; you may do so using .setCoder().
答案 0 :(得分:2)
1&GT;我应该在ExtractObjectsFn()中实现什么,以便将记录转换为BeamRecords?
在processElement()
的{{1}}方法中,您只需将CSV行从输入(ExtractObjectsFn
)转换为String
类型即可。用逗号分隔符(Clorox
)拆分字符串,它返回一个数组。迭代数组以检索CSV值并构造,
对象。
2 - ;如何将最终输出写入csv文件?
与上述类似的过程。您只需应用一个新的转换,将Clorox
转换为CSV行(BeamRecord
)。 String
的成员可以连接成一个字符串(CSV行)。应用此变换后,可以应用BeamRecord
变换将CSV行写入文件。