我正在尝试加入两个数据集。以下是两个数据集。
1/2/2009 6:17,iphone,800,Mastercard,carolina
1/2/2009 4:53,cloth,200,Visa,Betina
1/2/2009 13:08,cloth,100,Mastercard,Federica e Andrea
1/3/2009 14:44,blender,160,Visa,Gouya
1/4/2009 12:56,samsung,3600,Visa,Gerd W
1/4/2009 13:19,htc,1200,Visa,LAURENCE
1/4/2009 20:11,iphone,999,Mastercard,Fleur
1/2/2009 20:09,tmobile,81,Mastercard,adam
1/4/2009 13:17,iphone,400,Cash,Renee Elisabeth
类似地,其他数据集是:
Mastercard,MS
Visa,VS
我想加入两个数据集并得到如下输出:
(htc,VS)
(iphone,MS)
(iphone,NULL)
以下是我的方法:
def mapCard(cardname:String):String={
if(cardname.isEmpty()){
return "NONE"
}
else
return cardname
}
def main(args: Array[String]): Unit = {
val source = scala.io.Source.fromFile("bc.txt")
val keymap = scala.collection.mutable.Map[String, String]()
for (line <- source.getLines) {
val Array(country, capital) = line.split(",").map { _.trim() }
keymap += country -> capital
}
println(keymap)
val conf = new SparkConf().setMaster("local[2]").setAppName("AAA")
val sparkcontext = new SparkContext(conf)
val countriesCache = sparkcontext.broadcast(keymap)
val file = sparkcontext.textFile("salesdata.csv")
val a = file.map { line => line.split(",") }
.map { line => {
var columns = line(3)
if(countriesCache.value.contains(columns) )
{
columns.map { x => ( line(1),countriesCache.value(columns) ) }
}
else
columns.map { x => (line(1),"NULL") }
}
}
a.foreach(x=> println(x.mkString(",")))
}}
这不能给我输出。请在这里向我提出这个问题。相反,它给出如下。
HTC,VS),(HTC,VS),(HTC,VS),(HTC,VS)
(iphone,MS),(iphone,MS),(iphone,MS),(iphone,MS),(iphone,MS),(iphone,MS),(iphone,MS),(iphone,MS) ,(iphone,MS),(iphone,MS) (布,VS),(布,VS),(布,VS),(布,VS)
答案 0 :(得分:1)
我认为问题是你在这些方面迭代你唱歌的角色:
columns.map { x => ( line(1),countriesCache.value(columns) ) }
和
columns.map { x => (line(1),"NULL") }
只需使用
( line(1),countriesCache.value(columns) )
和
(line(1),"NULL")