烫伤比较连续记录

时间:2013-06-16 05:01:50

标签: scala enums scalding

有没有人知道如何在创建架构时比较scalding中的连续记录。我正在看教程6并假设如果记录#2中的数据大于记录#1(对于所有记录),我想要打印人的年龄

例如:

R1: John 30
R2: Kim 55
R3: Mark 20 

if Rn.age > R(n-1).age the output ... which will result to R2: Kim 55

编辑: 查看代码我刚才意识到它是一个Scala枚举,所以我的问题是如何比较scala枚举中的记录?

class Tutorial6(args : Args) extends Job(args) {
  /** When a data set has a large number of fields, and we want to specify those fields conveniently
    in code, we can use, for example, a Tuple of Symbols (as most of the other tutorials show), or a List of Symbols.
    Note that Tuples can only be used if the number of fields is at most 22, since Scala Tuples cannot have more
    than 22 elements. Another alternative is to use Enumerations, which we show here **/

  object Schema extends Enumeration {
    val first, last, phone, age, country = Value // arbitrary number of fields
  }

  import Schema._

  Csv("tutorial/data/phones.txt", separator = " ", fields = Schema)
    .read
    .project(first,age)
    .write(Tsv("tutorial/data/output6.tsv"))
}

1 个答案:

答案 0 :(得分:2)

似乎缺少Enumeration#Value的隐式转换,因此您可以自己定义:

import cascading.tuple.Fields
implicit def valueToFields(v: Enumeration#Value): Fields = v.toString

object Schema extends Enumeration {
  val first, last, phone, age, country = Value // arbitrary number of fields
}

import Schema._

var current = Int.MaxValue

Csv("tutorial/data/phones.txt", separator = " ", fields = Schema)
  .read
  .map(age -> ('current, 'previous)) { a: String =>
    val previous = current
    current = a.toInt
    current -> previous
  }
  .filter('current, 'previous) { age: (Int, Int) => age._1 > age._2 }
  .project(first, age)
  .write(Tsv("tutorial/data/output6.tsv"))

最后,我们希望结果与以下结果相同:

Csv("tutorial/data/phones.txt", separator = " ", fields = Schema)
  .read
  .map((new Fields("age"), (new Fields("current", "previous"))) { a: String =>
    val previous = current
    current = a.toInt
    current -> previous
  }
  .filter(new Fields("current", "previous")) { age: (Int, Int) =>
    age._1 > age._2
  }
  .project(new Fields("first", "age"))
  .write(Tsv("tutorial/data/output6.tsv"))

scalding提供的隐式转换允许您编写这些new Fields(...)的更短版本。

隐式转换只是一个视图,当您传递的参数不是预期类型,但可以通过此视图转换为适当的类型时,编译器将使用该视图。例如,因为map()在传递一对符号时需要一对Fields,因此Scala会搜索从Symbol -> SymbolFields -> Fields的隐式转换。可以找到关于观点的简短说明here

Scalding 0.8.5引入了从Eumeration#ValueFields的产品的转化,但缺少来自一对值的转化。 develop分支现在也提供后者。