为什么我的数据类型是以Int开始时的任何类型?

时间:2018-03-11 08:08:14

标签: scala types spark-dataframe

我正在读取带有权重的有向边(源节点和目标节点)的文件;第一部分似乎运作良好:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.functions._

object Q2 {

  case class Edge(src: Int, tgt: Int, weight: Int)
  case class Node(node: Int, weight: Int)

    def main(args: Array[String]) {
    val sc = new SparkContext(new SparkConf().setAppName("Q2"))
        val sqlContext = new SQLContext(sc)
        import sqlContext.implicits._

    // read the file to dataframe edges as class Edge
    val edges = sc.textFile("hdfs://localhost:8020" + args(0)).map(_.split("\t")).map(e => Edge(e(0).toInt, e(1).toInt, e(2).toInt)).toDF()
    edges.registerTempTable("tempEdges")

分配是计算输出的净消息(权重)。继续上面,我成功构建了两个数据帧,每个节点都有总的输入和输出值,并加入它们......我还通过从输入和输出数据帧中创建一个unionAll来实现它(负面的负权重)和总结它们......所以,问题解决了,但一路上我遇到了问题,导致我尝试创建一个Node类并将新的数据帧映射到它:

val in =  edges.groupBy("tgt").agg(sum("weight")).map(n => Node(n(0).toInt,n(1).toInt)).toDF()

这样做会导致此错误:

/home/cloudera/hw3-vm/q2/src/main/scala/edu/gatech/cse6242/Q2.scala:30: error: value toInt 
is not a member of Any
[INFO]     val in =  fdf.groupBy("tgt").agg(sum("weight")).map(n => Node(n(0).toInt,n(1).toInt)).toDF()

指向n(0).toInt

的指针

以Int开头的内容如何成为Any类型?如何将其转换回Int,或者更好的是,防止它变成Any?

1 个答案:

答案 0 :(得分:0)

每当我们在数据帧上使用map任何其他迭代函数时,每个迭代器都被视为特征行

所以当你这样做时

val in =  edges.groupBy("tgt").agg(sum("weight")).map(n => Node(n(0).toInt,n(1).toInt)).toDF()

n被设为Row当您尝试通过 Trait Row <{1}}或n(0)n(1)方法获取数据时调用/ em>,定义为

  
apply()
  

因此明确了如何返回 /** * Returns the value at position i. If the value is null, null is returned. The following * is a mapping between Spark SQL types and return types: * * {{{ * BooleanType -> java.lang.Boolean * ByteType -> java.lang.Byte * ShortType -> java.lang.Short * IntegerType -> java.lang.Integer * FloatType -> java.lang.Float * DoubleType -> java.lang.Double * StringType -> String * DecimalType -> java.math.BigDecimal * * DateType -> java.sql.Date * TimestampType -> java.sql.Timestamp * * BinaryType -> byte array * ArrayType -> scala.collection.Seq (use getList for java.util.List) * MapType -> scala.collection.Map (use getJavaMap for java.util.Map) * StructType -> org.apache.spark.sql.Row * }}} */ def apply(i: Int): Any = get(i) /** * Returns the value at position i. If the value is null, null is returned. The following * is a mapping between Spark SQL types and return types: * * {{{ * BooleanType -> java.lang.Boolean * ByteType -> java.lang.Byte * ShortType -> java.lang.Short * IntegerType -> java.lang.Integer * FloatType -> java.lang.Float * DoubleType -> java.lang.Double * StringType -> String * DecimalType -> java.math.BigDecimal * * DateType -> java.sql.Date * TimestampType -> java.sql.Timestamp * * BinaryType -> byte array * ArrayType -> scala.collection.Seq (use getList for java.util.List) * MapType -> scala.collection.Map (use getJavaMap for java.util.Map) * StructType -> org.apache.spark.sql.Row * }}} */ def get(i: Int): Any 并且 Trait Row中没有定义方法Any

Trait Row 中的方法toInt()定义为

  
getAs()
  

所以你可以这样做

 /**
   * Returns the value at position i.
   * For primitive types if value is null it returns 'zero value' specific for primitive
   * ie. 0 for Int - use isNullAt to ensure that value is not null
   *
   * @throws ClassCastException when data type does not match.
   */
  def getAs[T](i: Int): T = get(i).asInstanceOf[T]

或者您可以使用val in = edges.groupBy("tgt").agg(sum("weight")).map(n => Node(n.getAs[Int](0),n.getAs[Int](1))).toDF() //in: org.apache.spark.sql.DataFrame = [node: int, weight: int]

直接转换类型
.asInstanceOf[Int]

我希望答案很有帮助