想根据列的值过滤行。
df = dataSource0.toDF();
transDf = df.withColumn("daily_qty", F.split(F.concat_ws(',',*(x for x in sorted(df.columns) if x.startswith('daily_qty_'))),','))
ds2=transDf.withColumn("start_date",F.col("startdt_for_dailyqty")).withColumn("end_date",F.date_add(F.col("startdt_for_dailyqty"),48))
ds4 = ds2.withColumn("lineoffdate", F.expr("""sequence(start_date,end_date,interval 1 day)""")).withColumn("temp",F.arrays_zip("daily_qty","lineoffdate")).withColumn("temp",F.explode("temp"))))
ds4.select("part_no","prod_week","temp.daily_qty","temp.lineoffdate").show()
ds4.printSchema()
**ds4.filter(F.col('temp.daily_qty') != '000000000')**
print("after filter** " ,ds4.count())
过滤器不适用于该字段。对于其他领域,它起作用。我尝试了在过滤器中array_contains的daily_qty。它也没有用。我在这里想念什么。 下面的示例数据
+------------+---------+----------+-----------+
| part_no|prod_week| daily_qty|lineoffdate|
+------------+---------+----------+-----------+
|019990616100| 202004| 000000000| 2020-01-23|
|019990616100| 202004| 000000000| 2020-01-24|
|019990616100| 202004| 000000000| 2020-01-25|
|019990616100| 202005| 000000001| 2020-01-26|
+------------+---------+----------+-----------+
尝试爆炸两列,但对于爆炸的列,过滤器不起作用。
答案 0 :(得分:0)
“ daily_qty”列中的数据看起来像是带有空格字符的前缀。