Question

我正在读取带有权重的有向边（源节点和目标节点）的文件;第一部分似乎运作良好：

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.functions._

object Q2 {

  case class Edge(src: Int, tgt: Int, weight: Int)
  case class Node(node: Int, weight: Int)

    def main(args: Array[String]) {
    val sc = new SparkContext(new SparkConf().setAppName("Q2"))
        val sqlContext = new SQLContext(sc)
        import sqlContext.implicits._

    // read the file to dataframe edges as class Edge
    val edges = sc.textFile("hdfs://localhost:8020" + args(0)).map(_.split("\t")).map(e => Edge(e(0).toInt, e(1).toInt, e(2).toInt)).toDF()
    edges.registerTempTable("tempEdges")

分配是计算输出的净消息（权重）。继续上面，我成功构建了两个数据帧，每个节点都有总的输入和输出值，并加入它们......我还通过从输入和输出数据帧中创建一个unionAll来实现它（负面的负权重）和总结它们......所以，问题解决了，但一路上我遇到了问题，导致我尝试创建一个Node类并将新的数据帧映射到它：

val in =  edges.groupBy("tgt").agg(sum("weight")).map(n => Node(n(0).toInt,n(1).toInt)).toDF()

这样做会导致此错误：

/home/cloudera/hw3-vm/q2/src/main/scala/edu/gatech/cse6242/Q2.scala:30: error: value toInt 
is not a member of Any
[INFO]     val in =  fdf.groupBy("tgt").agg(sum("weight")).map(n => Node(n(0).toInt,n(1).toInt)).toDF()

指向n(0).toInt

的指针

以Int开头的内容如何成为Any类型？如何将其转换回Int，或者更好的是，防止它变成Any？

Answer 1

每当我们在数据帧上使用map或任何其他迭代函数时，每个迭代器都被视为特征行

所以当你这样做时

val in =  edges.groupBy("tgt").agg(sum("weight")).map(n => Node(n(0).toInt,n(1).toInt)).toDF()

n被设为Row当您尝试通过 Trait Row <{1}}或n(0)，n(1)方法获取数据时调用/ em>，定义为

apply()


因此明确了如何返回/** * Returns the value at position i. If the value is null, null is returned. The following * is a mapping between Spark SQL types and return types: * * {{{ * BooleanType -> java.lang.Boolean * ByteType -> java.lang.Byte * ShortType -> java.lang.Short * IntegerType -> java.lang.Integer * FloatType -> java.lang.Float * DoubleType -> java.lang.Double * StringType -> String * DecimalType -> java.math.BigDecimal * * DateType -> java.sql.Date * TimestampType -> java.sql.Timestamp * * BinaryType -> byte array * ArrayType -> scala.collection.Seq (use getList for java.util.List) * MapType -> scala.collection.Map (use getJavaMap for java.util.Map) * StructType -> org.apache.spark.sql.Row * }}} */ def apply(i: Int): Any = get(i) /** * Returns the value at position i. If the value is null, null is returned. The following * is a mapping between Spark SQL types and return types: * * {{{ * BooleanType -> java.lang.Boolean * ByteType -> java.lang.Byte * ShortType -> java.lang.Short * IntegerType -> java.lang.Integer * FloatType -> java.lang.Float * DoubleType -> java.lang.Double * StringType -> String * DecimalType -> java.math.BigDecimal * * DateType -> java.sql.Date * TimestampType -> java.sql.Timestamp * * BinaryType -> byte array * ArrayType -> scala.collection.Seq (use getList for java.util.List) * MapType -> scala.collection.Map (use getJavaMap for java.util.Map) * StructType -> org.apache.spark.sql.Row * }}} */ def get(i: Int): Any并且 Trait Row中没有定义方法Any

而 Trait Row 中的方法toInt()定义为



getAs()


所以你可以这样做

/** * Returns the value at position i. * For primitive types if value is null it returns 'zero value' specific for primitive * ie. 0 for Int - use isNullAt to ensure that value is not null * * @throws ClassCastException when data type does not match. */ def getAs[T](i: Int): T = get(i).asInstanceOf[T]

或者您可以使用val in = edges.groupBy("tgt").agg(sum("weight")).map(n => Node(n.getAs[Int](0),n.getAs[Int](1))).toDF() //in: org.apache.spark.sql.DataFrame = [node: int, weight: int]
直接转换类型
.asInstanceOf[Int]

我希望答案很有帮助

为什么我的数据类型是以Int开始时的任何类型？

1 个答案: