所以,我试图逐行读取 GCS 中的文件,在行尾添加行号,然后再将PCollection写出来(基本上,目标是索引PCollection)。因此,我编写了此代码,但却出错: 以下是 FileBasedSource 和 FileBasedReader 的实施类:
private static class LineSource extends FileBasedSource<String> {
public LineSource(String fileOrPattern) {
super(fileOrPattern);
}
private static class LineReader extends FileBasedSource.FileBasedReader<String> {
public LineReader(LineSource source) {
super(source);
}
}
我编写的使用这些类逐行读取文件的代码是:
try {
LineSource<String> f = new LineSource(fileName);
LineReader<String> b = new LineReader(f);
b.startReading(channel);
int index = 0;
while(b.readNextRecord()){
LOG.info("FileBasedSource: "+b.getCurrent());
c.output(b.getCurrent()+","+index);
index++;
}
b.close();
} catch (IOException e) {
e.printStackTrace();
}
但是我得到了这些错误:
java:[115,49] ')' expected
[ERROR] flatFileTest.java:[115,57] illegal start of expression
[ERROR] flatFileTest.java:[175,2] reached end of file while parsing
我想startReading()方法应该再次实现。 我对Dataflow来说相对较新。你能帮忙吗?