Pyspark 多重过滤器数据框

时间:2021-01-28 13:34:31

标签: apache-spark pyspark apache-spark-sql

我的输入火花数据框是;

Year  Month        Client 
2018  1            1        
2018  2            1         
2018  3            1         
2018  4            1         
2018  5            1         
2018  6            1        
2018  7            1        
2018  8            1        
2018  9            1         
2018  10           1          
2018  11           1        
2018  12           1    
2019  1            1        
2019  2            1         
2019  3            1         
2019  4            1         
2019  5            1         
2019  6            1        
2019  7            1        
2019  8            1        
2019  9            1         
2019  10           1          
2019  11           1        
2019  12           1  
2018  1            2        
2018  2            2         
2018  3            2         
2018  4            2         
2018  5            2         
2018  6            2        
2018  7            2        
2018  8            2        
2018  9            2         
2018  10           2        
2018  11           2        
2018  12           2        
2019  1            2        
2019  2            2         
2019  3            2         
2019  4            2         
2019  5            2         
2019  6            2        
2019  7            2        
2019  8            2        
2019  9            2         
2019  10           2        
2019  11           2        
2019  12           2      

Dataframe 按客户、年和月排序。我想为每个客户提取 2019-06 之后的数据。

我根据上面的数据分享了想要的输出;

Year  Month        Client 
2018  1            1        
2018  2            1         
2018  3            1         
2018  4            1         
2018  5            1         
2018  6            1        
2018  7            1        
2018  8            1        
2018  9            1         
2018  10           1          
2018  11           1        
2018  12           1    
2019  1            1        
2019  2            1         
2019  3            1         
2019  4            1         
2019  5            1         
2019  6            1        
2018  1            2        
2018  2            2         
2018  3            2         
2018  4            2         
2018  5            2         
2018  6            2        
2018  7            2        
2018  8            2        
2018  9            2         
2018  10           2        
2018  11           2        
2018  12           2        
2019  1            2        
2019  2            2         
2019  3            2         
2019  4            2         
2019  5            2         
2019  6            2        

你能帮我解决这个问题吗?

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

1 个答案:

答案 0 :(得分:1)

您是说 2019-06 之前吗? (你是在2019-06之后写的)

如果是这样,你可以做一个过滤器:

df2 = df.filter('Year < 2019 or (Year = 2019 and Month <= 6)')