我正在尝试创建一个非常简单的DataFrame,例如3列和3行。
我想有这样的事情:
+------+---+-----+
|nameID|age| Code|
+------+---+-----+
|2123 | 80| 4553|
|65435 | 10| 5454|
+------+---+-----+
如何在Scala中创建该Dataframe(是一个示例)。 我有下一个程序:
import org.apache.spark.sql._
import org.apache.spark.sql.types._
object ejemploApp extends App{
val schema = StructType(List(
StructField("name", LongType, true),
StructField("pandas", LongType, true),
StructField("id", LongType, true)))
}
val outputDF = sqlContext.createDataFrame(sc.emptyRDD, schema)
第一个问题: 它在outputDF中抛出无法解析符号模式的错误。
第二个问题: 如何将随机数添加到DataFrame?
答案 0 :(得分:3)
我会做这样的事情:
val nRows = 10
import scala.util.Random
val df = (1 to nRows)
.map(_ => (Random.nextLong,Random.nextLong,Random.nextLong))
.toDF("nameID","age","Code")
+--------------------+--------------------+--------------------+
| nameID| age| Code|
+--------------------+--------------------+--------------------+
| 5805854653225159387|-1935762756694500432| 1365584391661863428|
| 4308593891267308529|-1117998169834014611| 366909655761037357|
|-6520321841013405169| 7356990033384276746| 8550003986994046206|
| 6170542655098268888| 1233932617279686622| 7981198094124185898|
|-1561157245868690538| 1971758588103543208| 6200768383342183492|
|-8160793384374349276|-6034724682920319632| 6217989507468659178|
| 4650572689743320451| 4798386671229558363|-4267909744532591495|
| 1769492191639599804| 7162442036876679637|-4756245365203453621|
| 6677455911726550485| 8804868511911711123|-1154102864413343257|
|-2910665375162165247|-7992219570728643493|-3903787317589941578|
+--------------------+--------------------+--------------------+
当然,年龄不太现实,但你可以随意改变你的随机数(即使用标量模数函数和绝对值),你可以这样。
val df = (1 to nRows)
.map(id => (id.toLong,Math.abs(Random.nextLong % 100L),Random.nextLong))
.toDF("nameID","age","Code")
+------+---+--------------------+
|nameID|age| Code|
+------+---+--------------------+
| 1| 17| 7143235115334699492|
| 2| 83|-3863778506510275412|
| 3| 31|-3839786144396379186|
| 4| 40| 8057989112338559775|
| 5| 67| 7601061291211506255|
| 6| 71| 7393782421106239325|
| 7| 43| 28349510524075085|
| 8| 50| 539042255545625624|
| 9| 41|-8654000375112432924|
| 10| 82|-1300111870445007499|
+------+---+--------------------+
编辑:确保您已导入implicits:
Spark 1.6:
import sqlContext.implicits._
Spark 2:
import sparkSession.implicits._