如何处理索引值

时间:2018-04-24 10:01:46

标签: scala apache-spark apache-spark-sql sparkcore

这是文本文件格式的数据。我需要为每个城市找到最高薪水

first_name  last_name city          county       salary
--------------------------------------------------------
James        Butt     New Orleans   Orleans      250000
Josephine   Darakjy   Brighton      Livingston   300000
Art         Venere    Bridgeport    Gloucester   400000
Leota      Dilliard    Bridgeport   Gloucester   430000

> val scq = sc.textFile("path.txt")

> scq.flatMap(al=>al.split("\n")).sortBy(_._5,ascending = false).collect.take(5).foreach(println)
// sorting on salary 

但是我收到错误为value _5 is not a member of String,当我使用toString时,它会给出错误value _5 is not a member of char。 该如何处理?

1 个答案:

答案 0 :(得分:0)

试试这个:

> val scq = sc.textFile("path.txt")
> val d = scq.map(_.split("\t")).sortBy(_.apply(4), ascending = false)

这将产生RDD[Array[String]]作为输出。如果要将它们视为元组,可以执行以下操作:

> val d1 = d.map(c => (c(0), c(1), c(2), c(3), c(4))) // Prefer case class over this always
> d.collect.foreach(println)

这将产生以下输出:

(Leota,Dilliard,Bridgeport,Gloucester,430000)
(Art,Venere,Bridgeport,Gloucester,400000)
(Josephine,Darakjy,Brighton,Livingston,300000)
(James,Butt,New Orleans,Orleans,250000)