如何从表中计算出pyspark每周的平均值?

时间:2018-09-03 09:55:37

标签: python sql apache-spark pyspark

我的桌子如下:

+--------+-----+----+----------+
|  grDate|Items|rate|  unitName|
+--------+-----+----+----------+
|20131222|  SOY|34.7|Samastipur|
|20131223|  SOY|34.7|Samastipur|
|20131224|  SOY|34.7|Samastipur|
|20131225|  SOY|34.0|Samastipur|
|20131225|  SOY|34.7|Samastipur|
|20131227|  SOY|34.7|Samastipur|
|20131228|  SOY|34.7|Samastipur|
|20131229|  SOY|34.7|Samastipur|
|20131230|  SOY|34.6|Samastipur|
|20131230|  SOY|34.7|Samastipur|
|20131231|  SOY|34.7|Samastipur|
|20140101|  SOY|34.6|Samastipur|
|20140102|  SOY|34.6|Samastipur|
|20140103|  SOY|34.6|Samastipur|
|20140106|  SOY|34.6|Samastipur|
|20140107|  SOY|34.6|Samastipur|
|20140110|  SOY|33.9|Samastipur|
|20140111|  SOY|33.9|Samastipur|
|20140112|  SOY|33.9|Samastipur|
|20140113|  SOY|33.9|Samastipur|
+--------+-----+----+----------+
only showing top 20 rows

我想要的是每周计算一次费率列的平均值,并创建一个包含这些值的新表。

我正在尝试执行此查询

spjsoya.createOrReplaceTempView("spjsoya")
spark.sql("SELECT grDate , (sum(total)/(WEEK(MAX(rate)) - WEEK(MIN(rate))+1)) AS rate FROM spjsoya  WHERE YEAR(grDate)='2013' GROUP BY grDate").show()

但这给了我错误

pyspark.sql.utils.AnalysisException: u"Undefined function: 'WEEK'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 29"

如何编写正确的查询以平均每周进行一次?

1 个答案:

答案 0 :(得分:0)

下面是您的问题的代码:

Import