我从某个进程收到一条消息,试图将其与案例类进行映射。该消息位于以下用逗号分隔的管道符号中
|id1,5,2010-06-19,27.40,2010-06-20,35.40,2010-06-21,8.50,2010-06-22,23.40,2010-06-23,57.40,TX5|
邮件以这种方式打包
1.id
2.number of occurrences of 3 and 4 together
3.date //it repeats along with 4 based on 2
4.amount //it repeats along with 3 based on 2
5.code -- last field
尽管有5个高级别字段,但3和4可以基于2重复。
为了更好地理解,这里还有一些示例
|id2,7,2010-06-19,56.40,2010-06-20,23.76,2010-06-21,12.50,2010-06-22,87.12,2010-06-23,52.90,2010-06-24,35.70,2010-06-25,72.80,TX3|
|id3,4,2010-06-19,87.40,2010-06-20,32.40,2010-06-21,21.50,2010-06-22,73.40,TX2|
|id4,6,2010-06-19,56.12,2010-06-20,66.43,2010-06-21,23.12,2010-06-22,87.12,2010-06-23,34.90,2010-06-24,55.00,FT3|
我能够从开头和结尾删除管道符号。解析并获取第一个和最后一个字段。
scala> val str="id1,5,2010-06-19,27.40,2010-06-20,35.40,2010-06-21,8.50,2010-06-22,23.40,2010-06-23,57.40,TX5"
str: String = id1,5,2010-06-19,27.40,2010-06-20,35.40,2010-06-21,8.50,2010-06-22,23.40,2010-06-23,57.40,TX5
scala> val (id,code) = (str.split(",")(0), str.split(",").last)
id: String = id1
code: String = TX5
scala>
但是我如何映射其余的以适合案例类呢?
请注意,这与Scala: Parsing Array of String to a case class不同,该消息具有固定的列数,并且可以轻松地映射到案例类
答案 0 :(得分:1)
您尚未指定案例类的外观。这是一种可以合理地容忍任何格式较差的输入数据字符串的方法。
case class CC(id :String
,datePrice :Seq[(String,Double)]
,code :String)
import util.Try
def mkCC(dataStr :String) :CC = {
val dataArr = dataStr.split(",")
val id = dataArr.head.filter('|'.!=)
val code = dataArr.last.filter('|'.!=)
val dps = Try{
val len = dataArr(1).toInt
Seq.range(2, len*2+2, 2)
.flatMap(idx => Try{(dataArr(idx),dataArr(idx+1).toDouble)}.toOption)
}.getOrElse(Seq())
CC(id, dps, code)
}
用法:
val data1="|id2,7,2010-06-19,56.40,2010-06-20,23.76,2010-06-21,12.50,2010-06-22,87.12,2010-06-23,52.90,2010-06-24,35.70,2010-06-25,72.80,TX3|"
val data2="|id3,4,2010-06-19,87.40,2010-06-20,32.40,2010-06-21,21.50,2010-06-22,73.40,TX2|"
val data3="|id4,6,2010-06-19,56.12,2010-06-20,66.43,2010-06-21,23.12,2010-06-22,87.12,2010-06-23,34.90,2010-06-24,55.00,FT3|"
val cc1 :CC = mkCC(data1)
val cc2 :CC = mkCC(data2)
val cc3 :CC = mkCC(data3)