我想基于具有多个组的数据框的列创建唯一的ID。在字典中,我为每个组定义了一个ID。如何根据我创建的字典将此ID添加到此数据框。
下面是示例数据和代码
conf_1 = {
'cat':{'1': ['A_10','A_13'],
'2': ['B_8','B_4'],
'3': ['A_11','A_13'],
},
}
testlist = [
{"cat":"A_10","val":10},
{"cat":"A_13","val":11},
{"cat":"B_8","val":12},
{"cat":"B_4","val":14},
{"cat":"A_11","val":9},
{"cat":"A_13","val":16},
]
spark_df = spark.createDataFrame(testlist)
df = []
for i in conf_1['cat']:
testlist.filter((f.col('cat').isin(i)) | (f.col('cat').isin(i)))
.withColumn("id", monotonically_increasing_id())
最终输出应如下所示