我正在读取带有权重的有向边(源节点和目标节点)的文件;第一部分似乎运作良好:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.functions._
object Q2 {
case class Edge(src: Int, tgt: Int, weight: Int)
case class Node(node: Int, weight: Int)
def main(args: Array[String]) {
val sc = new SparkContext(new SparkConf().setAppName("Q2"))
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
// read the file to dataframe edges as class Edge
val edges = sc.textFile("hdfs://localhost:8020" + args(0)).map(_.split("\t")).map(e => Edge(e(0).toInt, e(1).toInt, e(2).toInt)).toDF()
edges.registerTempTable("tempEdges")
分配是计算输出的净消息(权重)。继续上面,我成功构建了两个数据帧,每个节点都有总的输入和输出值,并加入它们......我还通过从输入和输出数据帧中创建一个unionAll来实现它(负面的负权重)和总结它们......所以,问题解决了,但一路上我遇到了问题,导致我尝试创建一个Node类并将新的数据帧映射到它:
val in = edges.groupBy("tgt").agg(sum("weight")).map(n => Node(n(0).toInt,n(1).toInt)).toDF()
这样做会导致此错误:
/home/cloudera/hw3-vm/q2/src/main/scala/edu/gatech/cse6242/Q2.scala:30: error: value toInt
is not a member of Any
[INFO] val in = fdf.groupBy("tgt").agg(sum("weight")).map(n => Node(n(0).toInt,n(1).toInt)).toDF()
指向n(0).toInt
以Int开头的内容如何成为Any类型?如何将其转换回Int,或者更好的是,防止它变成Any?
答案 0 :(得分:0)
每当我们在数据帧上使用map
或任何其他迭代函数时,每个迭代器都被视为特征行
所以当你这样做时
val in = edges.groupBy("tgt").agg(sum("weight")).map(n => Node(n(0).toInt,n(1).toInt)).toDF()
n
被设为Row
当您尝试通过 Trait Row <{1}}或n(0)
,n(1)
方法获取数据时调用/ em>,定义为
apply()
因此明确了如何返回 /**
* Returns the value at position i. If the value is null, null is returned. The following
* is a mapping between Spark SQL types and return types:
*
* {{{
* BooleanType -> java.lang.Boolean
* ByteType -> java.lang.Byte
* ShortType -> java.lang.Short
* IntegerType -> java.lang.Integer
* FloatType -> java.lang.Float
* DoubleType -> java.lang.Double
* StringType -> String
* DecimalType -> java.math.BigDecimal
*
* DateType -> java.sql.Date
* TimestampType -> java.sql.Timestamp
*
* BinaryType -> byte array
* ArrayType -> scala.collection.Seq (use getList for java.util.List)
* MapType -> scala.collection.Map (use getJavaMap for java.util.Map)
* StructType -> org.apache.spark.sql.Row
* }}}
*/
def apply(i: Int): Any = get(i)
/**
* Returns the value at position i. If the value is null, null is returned. The following
* is a mapping between Spark SQL types and return types:
*
* {{{
* BooleanType -> java.lang.Boolean
* ByteType -> java.lang.Byte
* ShortType -> java.lang.Short
* IntegerType -> java.lang.Integer
* FloatType -> java.lang.Float
* DoubleType -> java.lang.Double
* StringType -> String
* DecimalType -> java.math.BigDecimal
*
* DateType -> java.sql.Date
* TimestampType -> java.sql.Timestamp
*
* BinaryType -> byte array
* ArrayType -> scala.collection.Seq (use getList for java.util.List)
* MapType -> scala.collection.Map (use getJavaMap for java.util.Map)
* StructType -> org.apache.spark.sql.Row
* }}}
*/
def get(i: Int): Any
并且 Trait Row中没有定义方法Any
而 Trait Row 中的方法toInt()
定义为
getAs()
所以你可以这样做
/**
* Returns the value at position i.
* For primitive types if value is null it returns 'zero value' specific for primitive
* ie. 0 for Int - use isNullAt to ensure that value is not null
*
* @throws ClassCastException when data type does not match.
*/
def getAs[T](i: Int): T = get(i).asInstanceOf[T]
或者您可以使用val in = edges.groupBy("tgt").agg(sum("weight")).map(n => Node(n.getAs[Int](0),n.getAs[Int](1))).toDF()
//in: org.apache.spark.sql.DataFrame = [node: int, weight: int]
.asInstanceOf[Int]
我希望答案很有帮助