Scala将字符串解析为案例对象

时间:2016-02-18 01:19:45

标签: scala apache-spark rdd string-parsing case-class

目前我是scala的新生,寻找scala编码帮助将字符串解析为案例类,

case class CategaryIds(id1: Long, id2: Long, id3: Long, secIds: Set[Long])

数据如下所示,表示为spark RDD

600045,8114,31679,"{1:2:3:4}"
600034,8114,34526,
600056,8114,31679,"{1:2:3:4}"
尝试下面的代码,抛出异常arrayoutofbund异常和numberformat异常

val fields = line.split(",").map(_.trim);
CategaryIds(fields(0).toLong,fields(1).toLong,fields(2).toLong,fields(3).replace("{","").replace("}", "").split(":").map(_.toLong).toSet)}

如果有更好的方法可以实现这一目标,请分享

2 个答案:

答案 0 :(得分:0)

这样的事情有效。

val fields = line.split(",").map(_.trim).toSeq

val seq = if (fields.size > 3) fields(3).split("\"{:}".toCharArray).filter(_ != "").map(_.toLong).toSet else Set[Long]()
CategaryIds(fields(0).toLong, fields(1).toLong,fields(2).toLong, seq)

首先检查Set是否为空,这样您就不会得到ArrayIndexOutOfBoundsException,然后按分隔符拆分并将它们转换为Longs

答案 1 :(得分:0)

可能Regex更适合

val r = """(\d*),(\d*),(\d*),(?:"\{(.*)\}")?""".r

"""600045,8114,31679,"{1:2:3:4}"""" match {
  case r(a,b,c,d) => println(s"a:$a, b:$b, c:$c, d:$d")
  case _ => println("no match")
}

"""600034,8114,34526,""" match {
  case r(a,b,c,d) => println(s"a:$a, b:$b, c:$c, d:$d")
  case _ => println("no match")
}
r: scala.util.matching.Regex = (\d*),(\d*),(\d*),(?:"\{(.*)\}")?

scala>      |      |      | a:600045, b:8114, c:31679, d:1:2:3:4

scala>      |      |      | a:600034, b:8114, c:34526, d:null

你可以用它

val r = """(\d*),(\d*),(\d*),(?:"\{(.*)\}")?""".r

somelines.map{
  case r(a,b,c,null) => 
    CategaryIds(a.toLong, b.toLong, c.toLong, Set())
  case r(a,b,c,d) => 
    CategaryIds(a.toLong, b.toLong, c.toLong, d.split[":"].toSet.map(_.toLong))
}