如何在Scala中将格式化的String转换为Tuple?

时间:2015-07-14 03:59:36

标签: scala tuples

我有一个包含以下内容的文本文件。

//((number,(number,date)),number)
((210,(18,2015/06/28)),57.0)
((92,(60,2015/06/16)),102.89777479000209)
((46,(18,2015/06/17)),52.8940162267246)
((204,(27,2015/06/06)),75.2807019793683)

我希望将其转换为元组,并且需要快速的方法来完成它。由于我所拥有的这类字符串的列表非常庞大。

编辑:我也想保持类型和结构信息。

任何帮助都将不胜感激。

3 个答案:

答案 0 :(得分:3)

超级简单的方法:

val splitRegex = "[(),]+".r
def f(s: String) = {
  val split = splitRegex.split(s)
 (split(1).toInt, split(2).toInt, split(3), split(4).toDouble)
}

f("((210,(18,2015/06/28)),57.0)")
// res0: (Int, Int, String, Double) = (210.0,18.0,2015/06/28,57.0)

更清洁的方式:

val TupleRegex = """\(\((\d+),\((\d+),(\d+/\d+/\d+)\)\),([\d.]+)\)""".r
def f(s: String) = s match {
  case TupleRegex(n1, n2, d, n3) => (n1.toInt, n2.toInt, d, n3.toDouble)
}

f("((210,(18,2015/06/28)),57.0)")
// res1: (Int, Int, String, Double) = (210.0,18.0,2015/06/28,57.0)

答案 1 :(得分:2)

我发现scala-parser-combinators是做这种事情的好方法;它比分裂或正则表达式更自我记录:

import scala.util.parsing.combinator.JavaTokenParsers
import org.joda.time.LocalDate

object MyParser extends JavaTokenParsers {
  override val skipWhitespace = false
  def date = (wholeNumber ~ "/" ~ wholeNumber ~ "/" ~ wholeNumber) ^^ { 
    case day ~ _ ~ month ~ _ ~ year =>
      new LocalDate(year.toInt, month.toInt, day.toInt)
  }
  def myNumber = decimalNumber ^^ { _.toDouble }
  def tupleElement: Parser[Any] = date | myNumber | tuple
  def tuple: Parser[List[Any]] = "(" ~> repsep(tupleElement, ",") <~ ")"
  def data = repsep(tuple, "\\n")
}

希望扩展这一点的方法很明显。使用类似于:

scala> MyParser.parseAll(MyParser.data, """((210,(18,2015/06/28)),57.0)
 | ((92,(60,2015/06/16)),102.89777479000209)
 | ((46,(18,2015/06/17)),52.8940162267246)
 | ((204,(27,2015/06/06)),75.2807019793683)""")
res1: MyParser.ParseResult[List[List[Any]]] = [4.41] parsed: List(List(List(210, List(18, LocalDate(28,6,2015))), 57.0), List(List(92, List(60, LocalDate(16,6,2015))), 102.89777479000209), List(List(46, List(18, LocalDate(17,6,2015))), 52.8940162267246), List(List(204, List(27, LocalDate(6,6,2015))), 75.2807019793683))

在编译时不能完全知道这些类型(在编译时使用宏或其他类型进行解析) - 上面是List[List[Any]],其中元素是{{1 },LocalDate或其他Double。您可以在运行时使用模式匹配来处理它。一个更好的方法可能是使用密封特性:

List

然后,当您在代码中有sealed trait TupleElement case class NestedTuple(val inner: List[TupleElement]) extends TupleElement case class NumberElement(val value: Double) extends TupleElement case class DateElement(val value: LocalDate) extends TupleElement def myNumber = decimalNumber ^^ { d => NumberElement(d.toDouble) } def tupleElement: Parser[TupleElement] = ... //etc. 并且模式匹配时,编译器会发出警告,如果您没有涵盖所有情况。

答案 2 :(得分:1)

假设字符串都是格式良好的,正则表达式,拆分和解析将非常快。你没有提到你是想维护原始数据(和获取类型)的结构还是仅仅是一包元组,但要么很容易:

val strings = Array("((210,(18,2015/06/28)),57.0)",
  "((92,(60,2015/06/16)),102.89777479000209)",
  "((46,(18,2015/06/17)),52.8940162267246)",
  "((204,(27,2015/06/06)),75.2807019793683)")

val dateFormat = new java.text.SimpleDateFormat("yyyy/MM/dd")

def toUnstructuredTuple(s:String):(Int, Int, java.util.Date, Double) = {
  val noParens = s.replaceAll("[\\(\\)]", "")
  val split = noParens.split(",")

  (split(0).toInt, split(1).toInt, dateFormat.parse(split(2)), split(3).toDouble)
}

def toStructedTuple(s:String):((Int,(Int, java.util.Date)), Double) = {
  val noParens = s.replaceAll("[\\(\\)]", "")
  val split = noParens.split(",")

  ((split(0).toInt, (split(1).toInt, dateFormat.parse(split(2)))), split(3).toDouble)
}


strings.foreach { s =>
  println("%s -> %s".format(s, toUnstructuredTuple(s)))
}


strings.foreach { s =>
  println("%s -> %s". format(s, toStructedTuple(s)))
}

这导致:

benderino 21:54 $ bin/scala tuples.scala
((210,(18,2015/06/28)),57.0) -> (210,18,Sun Jun 28 00:00:00 PDT 2015,57.0)
((92,(60,2015/06/16)),102.89777479000209) -> (92,60,Tue Jun 16 00:00:00 PDT 2015,102.89777479000209)
((46,(18,2015/06/17)),52.8940162267246) -> (46,18,Wed Jun 17 00:00:00 PDT 2015,52.8940162267246)
((204,(27,2015/06/06)),75.2807019793683) -> (204,27,Sat Jun 06 00:00:00 PDT 2015,75.2807019793683)
((210,(18,2015/06/28)),57.0) -> ((210,(18,Sun Jun 28 00:00:00 PDT 2015)),57.0)
((92,(60,2015/06/16)),102.89777479000209) -> ((92,(60,Tue Jun 16 00:00:00 PDT 2015)),102.89777479000209)
((46,(18,2015/06/17)),52.8940162267246) -> ((46,(18,Wed Jun 17 00:00:00 PDT 2015)),52.8940162267246)
((204,(27,2015/06/06)),75.2807019793683) -> ((204,(27,Sat Jun 06 00:00:00 PDT 2015)),75.2807019793683)