Question

如何将此类代码应用于包含多个记录的数据文件

class Iris(val sepal_len:Double,val sepal_width:Double,val petal_len:Double,
           val petal_width:Double,var sepal_area:Double,val species:String){
require(sepal_area == sepal_len*sepal_width, "wrong values")
def this(sepal_len:Double,
     sepal_width:Double,
     petal_len:Double,
     petal_width:Double,
     species:String
) = {
     this(sepal_len,sepal_width,petal_len,petal_width,sepal_len * sepal_width,species)
}
    override def toString:String = "Iris("+sepal_len+","+sepal_width+","+petal_len+","+petal_width+
                                      ","+sepal_area+","+species + ")"
}

val ir = new Iris(1.2,3.4,4.5,5.0,4.08,"setosa")
Iris(1.2,3.4,4.5,5.0,4.08,setosa)

val ir1 = new Iris(1.2,3.4,4.5,5.0,"setosa")
output => ir1: Iris = Iris(1.2,3.4,4.5,5.0,4.08,setosa)

请给我一些想法

Answer 1

由于您的示例，我假设您的sepal_area不是您班级中的必填字段，因此会得到相应的答案，但如果需要，则更改代码将很容易。

在我的示例答案中，我正在存储Iris的集合。您也可以创建一个类似的案例类：

case class Irises(irises: Seq[Iris])

CSV文件示例：

1,5.1,3.5,1.4,5.1,Iris-setosa
2,4.9,3,1.4,9.8,Iris-setosa
3,4.7,3.2,1.3,14.1,Iris-setosa
4,4.6,3.1,1.5,Iris-setosa
5,5,3.6,1.4,25,Iris-setosa
6,5.4,3.9,1.7,32.4,Iris-setosa

代码：

object Demo extends App {

  case class Iris(sepal_len: Double, sepal_width: Double, petal_len: Double,
             petal_width: Double, var sepal_area: Double = 0, species: String) {

    // if undefined on constructing the class
    sepal_area = if(sepal_area == 0) sepal_len * sepal_width else sepal_area

    def format(double: Double): Double = {
      double.round
    }

    // had weird thing where 4.7 * 3 = 14.100000000000001
    require(format(sepal_area) == format(sepal_len * sepal_width), "wrong values")

    // using string concatenation looks much more readable
    override def toString: String = s"Iris($sepal_len,$sepal_width,$petal_len,$petal_width,$sepal_area,$species)"
  }

  // file.csv found in {project-name}/file.csv
  val bufferedSource = io.Source.fromFile("file.csv")
  val seq = bufferedSource.getLines.map {
    line =>

      // for each line in csv file, split by comma and remove all whitespace around each part
      val cols = line.split(",").map(_.trim)

      // define parts
      val sepal_len = cols.head.toDouble
      val sepal_width = cols(1).toDouble
      val petal_len = cols(2).toDouble
      val petal_width = cols(3).toDouble
      val species = if(cols.length == 6) cols(5) else cols(4)

      // if sepal_area is defined
      if (cols.length == 6) Iris(sepal_len, sepal_width, petal_len, petal_width, cols(4).toDouble, species)
      // if sepal_area is not defined
      else Iris(sepal_len, sepal_width, petal_len, petal_width, species = species)
  }.toSeq

  seq.foreach(println)
  // Iris(1.0,5.1,3.5,1.4,5.1,Iris-setosa)
  // Iris(2.0,4.9,3.0,1.4,9.8,Iris-setosa)
  // Iris(3.0,4.7,3.2,1.3,14.1,Iris-setosa)
  // Iris(4.0,4.6,3.1,1.5,18.4,Iris-setosa)
  // Iris(5.0,5.0,3.6,1.4,25.0,Iris-setosa)
  // Iris(6.0,5.4,3.9,1.7,32.4,Iris-setosa)

  val newSeq = seq.toSeq
  newSeq.foreach(println)

  // close the source once you've finished with it
  bufferedSource.close
}

Answer 2

使用foreach并为每一行创建一个对象集合。也许是一个清单

Answer 3

我建议你使用这类数据结构的分类和类重载的伴随对象。

case class Iris(sepal_len: Double, sepal_width: Double, petal_len: Double, petal_width: Double, sepal_area: Double, species: String) {
    require(sepal_area == sepal_len * sepal_width, "wrong values")
}

object Iris {
    def apply(sepal_len: Double, sepal_width: Double, petal_len: Double, petal_width: Double, species: String) =
        new Iris(sepal_len, sepal_width, petal_len, petal_width, sepal_len * sepal_width, species)
}

val ir = Iris(1.2, 3.4, 4.5, 5.0, 4.08, "setosa")
val ir1 = Iris(1.2, 3.4, 4.5, 5.0, "setosa")

Scala-如何将此类代码应用于包含多个记录的数据文件

3 个答案: