我希望得到一个降序并使用spark从csv文件舍入为zhvi
个整数。
但是,当我在代码末尾尝试sort(desc("Zhvi"))
时。它总是给我错误。
from pyspark.sql.functions import col, desc
stateByZhvi = home.select('State','Zhvi').groupBy((col("State"))).avg("Zhvi").show()
和我的部分结果:
+-----+------------------+
|State| avg(Zhvi)|
+-----+------------------+
| AZ|246687.01298701297|
| SC|143188.94736842104|
| LA|159991.74311926606|
| MN|236449.40239043825|
| NJ| 367156.5637065637|
| DC| 586109.5238095238|
| OR| 306646.3768115942|
| VA| 282764.4986449864|
任何人都可以提供帮助吗?
答案 0 :(得分:2)
// input dataframe
+-----+------------------+
|State| avg|
+-----+------------------+
| AZ|246687.01298701297|
| SC|143188.94736842104|
| LA|159991.74311926606|
+-----+------------------+
df.orderBy(desc("avg")).show()
//
+-----+------------------+
|State| avg|
+-----+------------------+
| AZ|246687.01298701297|
| LA|159991.74311926606|
| SC|143188.94736842104|
+-----+------------------+
可能还有另一个问题,似乎你正在使用“sort(desc(”Zhvi“))”,
然而,在avg函数之后列名更改,“|状态| avg(Zhvi) |”
由于
答案 1 :(得分:0)
使用SQL怎么样:
home.createOrReplaceTempView("home")
spark.sql("select State, round(avg(Zhvi)) as avg_Zhvi from home group by State order by 2 desc").show()
答案 2 :(得分:0)
我正在解决您遇到的相同问题,这是我的解决方案。使用agg,avg,别名和排序依据(将递增参数设置为false):
from pyspark.sql.functions import *
stateByZhvi = home.groupBy((col("State"))).agg.avg(col("Zhvi")).alias("avg_Zhvi").orderBy("avg_Zhvi", ascending=False).select('State','avg_Zhvi')show()