将字母变成单词的功能

时间:2019-06-12 23:53:52

标签: eclipse scala apache-spark

这是我的文本文件,是程序的输入内容

Id Title Copy 
B2002010 gyh 1 
D2001001 abc 12 
M2003005 zxc 3 
D2002003 qwe 13 
M2001002 efg 1 
D2001004 asd 6 
D2003005 zxc 3 
M2001006 wer 6 
D2001006 wer 6 
B2004008 sxc 10 
D2002007 sdf 9 
D2004008 sxc 10

ID的格式为Xyyyyrrr,其中:

  • X是B =>书籍或M =>杂志
  • yyyy是年份
  • rrr是一个随机数

我需要做的是将第一个字母更改为一个单词。

例如:

(D2002,24) --> Dictionary,2002,24

我的Spark项目在Eclipse上,并且正在使用Maven和Scala IDE l。

package bd.spark_app 
import org.apache.spark.SparkConf 
import org.apache.spark.SparkContext 
import org.apache.spark.SparkContext._ 
import org.apache.spark.sql.SQLContext 
import org.apache.spark.sql._ 
import org.apache.spark.sql.types.IntegerType 
import scala.io.Source 
import org.apache.spark.sql.functions._ 
import scala.collection.mutable.WrappedArray 
import org.apache.log4j._ 
import org.apache.spark.sql.types.{StructType, StructField, StringType} 
import org.apache.spark.sql.Row 
import scala.Array 

object alla { def main(args:Array[String]) = { 

    val conf  =newSparkConf().setMaster("local")
    .setAppName("trying   ") 
    val sc = new SparkContext(conf) 
    val x = 
    sc.textFile("/home/hadoopusr/sampledata") 

    val converted = x.map(_.split(" ")).map(r => 
    (r(0).dropRight(3), r(2).toInt)) val result = 
    converted.reduceByKey(_ + _)
    sc.stop() } } 

结果是

(M2001,7) (D2001,24) (M2003,3) (D2003,3) (D2002,22) (D2004,10) (B2002,1) (B2004,10)

我希望结果是

(Magazine, 2001 ,7)
(Dictionary, 2001, 24)
(Magazine ,2003, 3)
(Dictionary, 2003, 3). 

以此类推。

一个简单的功能会有所帮助。

1 个答案:

答案 0 :(得分:2)

能帮上忙吗?

rdd.map(_.split(" "))
   .map(str => ((str.head.head match {
        case 'M' => "Magazine"
        case 'B' => "Book"
        case 'D' => "Dictionary"
        case _ => ???
      }, str.head.drop(1).dropRight(3).toInt), str.last.toInt))
   .reduceByKey(_ + _)
   .map(tuple => (tuple._1._1, tuple._1._2, tuple._2))

示例输出(已验证):

(Magazine,2003,3),(Dictionary,2001,24),(Dictionary,2003,3), (Book,2002,1),(Magazine,2001,7),(Book,2004,10), (Dictionary,2002,22),(Dictionary,2004,10)