我正在尝试对列进行分组并采用最少的列。然后使用最小值计算与日期的差异。但是,当我使用最小日期列时,出现以下错误:
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"grouping expressions sequence is empty, and 'table.`lbrnm`' is not an aggregate function. Wrap '(datediff(CAST(CAST('2019-02-28 01:00:00' AS TIMESTAMP) AS DATE), CAST(CAST(concat(CASE WHEN (CAST((CAST(`min_date` AS DECIMAL(7,0)) / CAST(CAST(1000000 AS DECIMAL(7,0)) AS DECIMAL(7,0))) AS DOUBLE) = CAST('1' AS DOUBLE)) THEN '20' ELSE '' END, CASE WHEN (CAST((CAST(`min_date` AS DECIMAL(7,0)) / CAST(CAST(1000000 AS DECIMAL(7,0)) AS DECIMAL(7,0))) AS DOUBLE) = CAST('0' AS DOUBLE)) THEN '19' ELSE '' END, substring(substring(CAST(min(table.lspf.`lsdte`) AS STRING), 0, 3), -2, 2),
这是我的代码:
j = lspf_ret.groupBy(col("lsbrnm"),
col("lsdlp"),
col("lsdlr"))
.agg(min(col('lsdte')))
j.select('lsbrnm','lsdlp','lsdlr',col('min(lsdte)').alias('min_date'))
.select('lsbrnm',
'lsdlp',
'lsdlr',
'min_date',
datediff(lit('2019-02-28 01:00:00').cast(TimestampType()),
concat(when(col("min_date")/1000000=='1','20').otherwise(''),
when(col("min_date")/1000000=='0','19').otherwise(''),
right(left(min(lspf.lsdte).cast(StringType()),3),2),
lit('-'),
left(right(min(lspf.lsdte).cast(StringType()),4),2),
lit('-'),
right(min(lspf.lsdte).cast(StringType()),2),
lit(' 00:00:00') ).cast(TimestampType())))
以下是汇总输出:
|lsbrnm|lsdlp| lsdlr|min(lsdte)|
+------+-----+-------------+----------+
| 2266| EF4| 171001370957| 1190201|
| 2266| EF4| 131201027045| 1171130|
| 2266| EF4| 140901072492| 1170301|
| 2266| EF4| 160901268734| 1180925|
| 2266| EF4| 161101289209| 1170929|
| 2266| EA4| 18501424940R| 1190220|
这是所需的输出:
|lsbrnm|lsdlp| lsdlr| min_date|difference
+------+-----+-------------+----------++----------+
| 2266| EF4| 171001370957| 1190201| 27|
| 2266| EF4| 131201027045| 1171130| 275|
| 2266| EF4| 140901072492| 1170301| 1|
| 2266| EF4| 160901268734| 1180925| 209|
| 2266| EF4| 161101289209| 1170929| 213|
| 2266| EA4| 18501424940R| 1190220| 8|
lspf.lsdte的示例:
lsdte
+------+
1190201
1171130
1170301
1180925
1170929
1190220
我要去哪里错了?