无法在DF上执行数据帧操作

时间:2018-03-28 16:33:18

标签: scala apache-spark dataframe

我无法执行数据框操作。帮帮我

scala> val dF = sqlContext.sql(“select * from employeeTable”).collect()
dF: Array[org.apache.spark.sql.Row] = 
Array([id,name,age,gender,level,salary], [1,Joseph,23,m,1,50000],
[2,Sharma,25,m,1,55000], [3,Varma,26,m,2,60000], [4,Aj,27,m,3,65000], 
[5,Varun,22,m,1,45000], [6,Ajay,29,m,3,95000], [7,Vijay,31,m,4,125000], 
[8,Kaushik,33,m,5,145000], [9,Gopi,21,m,1,25000], [10,Kumar,27,m,3,75000], 
[11,Kumari,21,f,1,35000], [12,Tina,22,f,2,45000], [13,Alexa,23,f,3,55000], 
[14,Casey,25,f,1,25000])

scala> dF.show()
:28: error: value show is not a member of Array[org.apache.spark.sql.Row]
dF.show()
^

scala> dF.printSchema()
:28: error: value printSchema is not a member of Array[org.apache.spark.sql.Row]
dF.printSchema()
^

scala> dF.head()
:28: error: not enough arguments for method apply: 
(i: Int)Any in trait Row.
Unspecified value parameter i.
dF.head()
^

scala> dF.head(1)
res3: Any = name

scala> dF.head(5)
res4: Any = salary

scala> dF.describe()
:28: error: value describe is not a member of 
Array[org.apache.spark.sql.Row]
dF.describe()
^

scala> dF.count()
:28: error: not enough arguments for method count: (p: 
org.apache.spark.sql.Row => Boolean)Int.
Unspecified value parameter p.
dF.count()
^

scala> dF.count(5)
:28: error: type mismatch;
found : Int(5)
required: org.apache.spark.sql.Row => Boolean
dF.count(5)
^

scala> dF.distinct()
:28: error: not enough arguments for method apply: 
(i: Int)org.apache.spark.sql.Row in class Array.
Unspecified value parameter i.
dF.distinct()
^

scala> dF.collect()
:28: error: not enough arguments for method collect: 
(pf: PartialFunction[org.apache.spark.sql.Row,B])(implicit bf:
scala.collection.generic.CanBuildFrom[Array[org.apache.spark.sql.Row],B,That])
That.
Unspecified value parameter pf.
dF.collect()
^

scala> dF.head(3)
res10: Any = gender

1 个答案:

答案 0 :(得分:0)

创建DataFrame的正确方法是:

val df = spark.sql("SQL EXPRESSION")

<强>解决方案

删除collect()部分:

<强>解释

对于您当前的代码,您获得了Array而不是DataFrame,因为collec(),因此您可以尝试从dF Array获取一行,然后使用Row类的方法,如:

dF(0).schema()

dF(1).get(0) dF(1).get(1) dF(1).get(...) dF(1).get(5)