目前我是scala的新生,寻找scala编码帮助将字符串解析为案例类,
case class CategaryIds(id1: Long, id2: Long, id3: Long, secIds: Set[Long])
数据如下所示,表示为spark RDD
600045,8114,31679,"{1:2:3:4}"
600034,8114,34526,
600056,8114,31679,"{1:2:3:4}"
尝试下面的代码,抛出异常arrayoutofbund异常和numberformat异常
val fields = line.split(",").map(_.trim);
CategaryIds(fields(0).toLong,fields(1).toLong,fields(2).toLong,fields(3).replace("{","").replace("}", "").split(":").map(_.toLong).toSet)}
如果有更好的方法可以实现这一目标,请分享
答案 0 :(得分:0)
这样的事情有效。
val fields = line.split(",").map(_.trim).toSeq
val seq = if (fields.size > 3) fields(3).split("\"{:}".toCharArray).filter(_ != "").map(_.toLong).toSet else Set[Long]()
CategaryIds(fields(0).toLong, fields(1).toLong,fields(2).toLong, seq)
首先检查Set
是否为空,这样您就不会得到ArrayIndexOutOfBoundsException
,然后按分隔符拆分并将它们转换为Longs
答案 1 :(得分:0)
可能Regex更适合
val r = """(\d*),(\d*),(\d*),(?:"\{(.*)\}")?""".r
"""600045,8114,31679,"{1:2:3:4}"""" match {
case r(a,b,c,d) => println(s"a:$a, b:$b, c:$c, d:$d")
case _ => println("no match")
}
"""600034,8114,34526,""" match {
case r(a,b,c,d) => println(s"a:$a, b:$b, c:$c, d:$d")
case _ => println("no match")
}
r: scala.util.matching.Regex = (\d*),(\d*),(\d*),(?:"\{(.*)\}")?
scala> | | | a:600045, b:8114, c:31679, d:1:2:3:4
scala> | | | a:600034, b:8114, c:34526, d:null
你可以用它
val r = """(\d*),(\d*),(\d*),(?:"\{(.*)\}")?""".r
somelines.map{
case r(a,b,c,null) =>
CategaryIds(a.toLong, b.toLong, c.toLong, Set())
case r(a,b,c,d) =>
CategaryIds(a.toLong, b.toLong, c.toLong, d.split[":"].toSet.map(_.toLong))
}