检查pyspark数据帧的每个分区中是否存在值的任何组合

时间:2020-10-13 21:30:18

标签: pyspark apache-spark-sql pyspark-dataframes

我有两个PySpark数据框,即具有以下数据的数据框A:

Shop Customer iteam_sold  
A    C1         bread      
A    C1         egg        
A    C2         jam        
A    C2         rice       
A    C2         bread      
B    C2         bread      
B    C2         apple      
B    C2         milk       
B    C3         milk       
B    C3         egg 

   

DataFrame B,其中包含项目详细信息的两列在分区中组合在一起时必须标记:

iteam_1  iteam_2  
 milk      egg
 bread     milk

通过按Shop和Customer对DataFrame A中的数据进行分区,要检查是否有任何分区在items_sold列中具有iteam_1和iteam_2的组合

预期输出DF:

Shop Customer iteam_sold  Flag
A    C1         bread      False
A    C1         egg        False
A    C2         jam        False
A    C2         rice       False
A    C2         bread      False
B    C2         bread      True
B    C2         apple      True
B    C2         milk       True
B    C3         milk       True
B    C3         egg        True

0 个答案:

没有答案