有没有人知道如何在创建架构时比较scalding中的连续记录。我正在看教程6并假设如果记录#2中的数据大于记录#1(对于所有记录),我想要打印人的年龄
例如:
R1: John 30
R2: Kim 55
R3: Mark 20
if Rn.age > R(n-1).age the output ... which will result to R2: Kim 55
编辑: 查看代码我刚才意识到它是一个Scala枚举,所以我的问题是如何比较scala枚举中的记录?
class Tutorial6(args : Args) extends Job(args) {
/** When a data set has a large number of fields, and we want to specify those fields conveniently
in code, we can use, for example, a Tuple of Symbols (as most of the other tutorials show), or a List of Symbols.
Note that Tuples can only be used if the number of fields is at most 22, since Scala Tuples cannot have more
than 22 elements. Another alternative is to use Enumerations, which we show here **/
object Schema extends Enumeration {
val first, last, phone, age, country = Value // arbitrary number of fields
}
import Schema._
Csv("tutorial/data/phones.txt", separator = " ", fields = Schema)
.read
.project(first,age)
.write(Tsv("tutorial/data/output6.tsv"))
}
答案 0 :(得分:2)
似乎缺少Enumeration#Value的隐式转换,因此您可以自己定义:
import cascading.tuple.Fields
implicit def valueToFields(v: Enumeration#Value): Fields = v.toString
object Schema extends Enumeration {
val first, last, phone, age, country = Value // arbitrary number of fields
}
import Schema._
var current = Int.MaxValue
Csv("tutorial/data/phones.txt", separator = " ", fields = Schema)
.read
.map(age -> ('current, 'previous)) { a: String =>
val previous = current
current = a.toInt
current -> previous
}
.filter('current, 'previous) { age: (Int, Int) => age._1 > age._2 }
.project(first, age)
.write(Tsv("tutorial/data/output6.tsv"))
最后,我们希望结果与以下结果相同:
Csv("tutorial/data/phones.txt", separator = " ", fields = Schema)
.read
.map((new Fields("age"), (new Fields("current", "previous"))) { a: String =>
val previous = current
current = a.toInt
current -> previous
}
.filter(new Fields("current", "previous")) { age: (Int, Int) =>
age._1 > age._2
}
.project(new Fields("first", "age"))
.write(Tsv("tutorial/data/output6.tsv"))
scalding提供的隐式转换允许您编写这些new Fields(...)
的更短版本。
隐式转换只是一个视图,当您传递的参数不是预期类型,但可以通过此视图转换为适当的类型时,编译器将使用该视图。例如,因为map()
在传递一对符号时需要一对Fields
,因此Scala会搜索从Symbol -> Symbol
到Fields -> Fields
的隐式转换。可以找到关于观点的简短说明here。
Scalding 0.8.5引入了从Eumeration#Value
到Fields
的产品的转化,但缺少来自一对值的转化。 develop
分支现在也提供后者。