火花地图和广播一起发布在程序中

时间:2017-10-26 09:30:09

标签: scala hadoop apache-spark

我正在尝试加入两个数据集。以下是两个数据集。

1/2/2009 6:17,iphone,800,Mastercard,carolina
1/2/2009 4:53,cloth,200,Visa,Betina
1/2/2009 13:08,cloth,100,Mastercard,Federica e Andrea
1/3/2009 14:44,blender,160,Visa,Gouya
1/4/2009 12:56,samsung,3600,Visa,Gerd W 
1/4/2009 13:19,htc,1200,Visa,LAURENCE
1/4/2009 20:11,iphone,999,Mastercard,Fleur
1/2/2009 20:09,tmobile,81,Mastercard,adam
1/4/2009 13:17,iphone,400,Cash,Renee Elisabeth

类似地,其他数据集是:

Mastercard,MS
Visa,VS

我想加入两个数据集并得到如下输出:

(htc,VS)
(iphone,MS)
(iphone,NULL)

以下是我的方法:

  def mapCard(cardname:String):String={
    if(cardname.isEmpty()){
      return "NONE"     
    }
    else
      return cardname
  }
 def main(args: Array[String]): Unit = {
    val source = scala.io.Source.fromFile("bc.txt")
    val keymap = scala.collection.mutable.Map[String, String]()
    for (line <- source.getLines) {
      val Array(country, capital) = line.split(",").map { _.trim() }
      keymap += country -> capital
    }

    println(keymap)

    val conf = new SparkConf().setMaster("local[2]").setAppName("AAA")
    val sparkcontext = new SparkContext(conf)
    val countriesCache = sparkcontext.broadcast(keymap)

    val file = sparkcontext.textFile("salesdata.csv")

    val a = file.map { line => line.split(",") }
                .map { line => {                        
                              var columns = line(3)                                           
                              if(countriesCache.value.contains(columns) )
                                {
                                columns.map { x => ( line(1),countriesCache.value(columns) ) }
                              }                                
                              else 
                                 columns.map { x => (line(1),"NULL") }

                               }
                     }
    a.foreach(x=> println(x.mkString(",")))
  }} 

这不能给我输出。请在这里向我提出这个问题。相反,它给出如下。

  

HTC,VS),(HTC,VS),(HTC,VS),(HTC,VS)

     
    

(iphone,MS),(iphone,MS),(iphone,MS),(iphone,MS),(iphone,MS),(iphone,MS),(iphone,MS),(iphone,MS) ,(iphone,MS),(iphone,MS)     (布,VS),(布,VS),(布,VS),(布,VS)

  

1 个答案:

答案 0 :(得分:1)

我认为问题是你在这些方面迭代你唱歌的角色:

columns.map { x => ( line(1),countriesCache.value(columns) ) }

columns.map { x => (line(1),"NULL") }

只需使用

( line(1),countriesCache.value(columns) )

(line(1),"NULL")