逐行读取GCS中的文件,使用FileBasedReader将行号附加到它

时间:2017-06-28 10:21:31

标签: google-cloud-storage google-cloud-dataflow

所以,我试图逐行读取 GCS 中的文件,在行尾添加行号,然后再将PCollection写出来(基本上,目标是索引PCollection)。因此,我编写了此代码,但却出错: 以下是 FileBasedSource FileBasedReader 的实施类:

private static class LineSource extends FileBasedSource<String> {
public LineSource(String fileOrPattern) {
  super(fileOrPattern);
}
private static class LineReader extends FileBasedSource.FileBasedReader<String> {
  public LineReader(LineSource source) {
  super(source);
  }
}

我编写的使用这些类逐行读取文件的代码是:

try {

          LineSource<String> f = new LineSource(fileName);
          LineReader<String> b = new LineReader(f);
          b.startReading(channel);
          int index = 0;
          while(b.readNextRecord()){
            LOG.info("FileBasedSource: "+b.getCurrent());
            c.output(b.getCurrent()+","+index);
            index++;
          }
          b.close();

      } catch (IOException e) {
          e.printStackTrace();
      }

但是我得到了这些错误:

java:[115,49] ')' expected
[ERROR] flatFileTest.java:[115,57] illegal start of expression
[ERROR] flatFileTest.java:[175,2] reached end of file while parsing

我想startReading()方法应该再次实现。 我对Dataflow来说相对较新。你能帮忙吗?

0 个答案:

没有答案