TextLine没有给出字节偏移(缩放)

时间:2018-10-08 04:05:18

标签: scalding

TextLine根据文档没有给出行的字节偏移量。而是提供行号。输出也粘贴在下面。

TextLine(input).write(Tsv(output))

0       This is the 100th Etext file presented by Project Gutenberg, and
1       is presented in cooperation with World Library, Inc., from their
2       Library of the Future and Shakespeare CDROMS.  Project Gutenberg
3       often releases Etexts that are NOT placed in the Public Domain!!
4       
5       Shakespeare
6       
7       *This Etext has certain copyright implications you should read!*

看起来像tutorial示例清楚地表明它是它发出的行号,但文档中一直说字节偏移。烫伤中是否有开箱即用的字节偏移读取类?

/**
Scalding tutorial part 1.
In part 0, we made a copy of hello.txt, but it wasn't a perfect copy:
it was annotated with line numbers.
That's because the data stream coming out of a TextLine source actually
has two fields: one, called "line", has the actual line of text. The other,
called "num", has the line number in the file. When you write these
tuples to a TextLine, it naively outputs them both on each line.
We can ask scalding to select just the "line" field from the pipe, using the
project() method. When we refer to a data stream's fields, we use Scala symbols,
like this: 'line.
To run this job:
  scripts/scald.rb --local tutorial/Tutorial1.scala
Check the output:
  cat tutorial/data/output1.txt
**/

0 个答案:

没有答案