我有两个PySpark数据框,即具有以下数据的数据框A:
Shop Customer iteam_sold
A C1 bread
A C1 egg
A C2 jam
A C2 rice
A C2 bread
B C2 bread
B C2 apple
B C2 milk
B C3 milk
B C3 egg
DataFrame B,其中包含项目详细信息的两列在分区中组合在一起时必须标记:
iteam_1 iteam_2
milk egg
bread milk
通过按Shop和Customer对DataFrame A中的数据进行分区,要检查是否有任何分区在items_sold列中具有iteam_1和iteam_2的组合
预期输出DF:
Shop Customer iteam_sold Flag
A C1 bread False
A C1 egg False
A C2 jam False
A C2 rice False
A C2 bread False
B C2 bread True
B C2 apple True
B C2 milk True
B C3 milk True
B C3 egg True