Spark过滤功能与地图

时间:2016-09-23 06:18:40

标签: scala apache-spark

我是一个新手来点火并且在尝试过滤地图时遇到问题。 我试图从.csv文件中删除teh标头并尝试提取某些记录。但由于某种原因,我的过滤条件是 不工作

val dataWithHeader = sc.textFile("/user/skv/airlines.csv")  
val headerAndRows = dataWithHeader.map(x => x.split(",").map(_.trim)
val Header = headerAndRows.first    
val data = headerAndRows.filter(_(0) != Header(0))

val maps = data.map( x => Header.zip(x).toMap)       
 //result looks like //res0:     
 // Array[scala.collection.immutable.Map[String,String]] =     
 // Array(Map(Code -> "19031", Description -> "Mackey International Inc.: MAC"),
 //       Map(Code -> "19032", Description -> "Munz Northern Airlines Inc.: XY"), 
 //now when i am trying to filter the map with the below condition the filter is not working ?

val result = maps.filter(x => x("Code") != "19031") 

airlines.csv看起来像

 Code,Description
"19031","Mackey International Inc.: MAC"
"19032","Munz Northern Airlines Inc.: XY"
"19033","Cochise Airlines Inc.: COC"   
"19034","Golden Gate Airlines Inc.: GSA"  
"19035","Aeromech Inc.: RZZ" 
"19036","Golden West Airlines Co.: GLW"  
"19037","Puerto Rico Intl Airlines: PRN"  
"19038","Air America Inc.: STZ"  
"19039","Swift Aire Lines Inc.: SWT"

2 个答案:

答案 0 :(得分:3)

你好像有一对双引号(因为你从你的csv读取双引号)。

尝试替换

val headerAndRows = dataWithHeader.map(x => x.split(",").map(_.trim)

val headerAndRows = dataWithHeader.map(x => x.split(",").map(_.trim.replace("\"", ""))

答案 1 :(得分:1)

由于您的数据中包含double quote。您可以通过两种方式完成工作:

  
      
  1. 通过替换双引号(由Raphael Roth回答)来删除数据中的双引号

  2.   
  3. 将您的值与双引号进行比较,例如

  4.   
 val result = maps.filter(x => { 
      x("Code") != "\"19031\""
    })