如何有效地从数据框中删除具有单个值的所有列?
我发现了两种方法:
此方法将忽略 null ,仅考虑其他值,我需要在我的情况下考虑null
# apply countDistinct on each column
col_counts = partsDF.agg(*(countDistinct(col(c)).alias(c) for c in partsDF.columns)).collect()[0].asDict()
此方法花费的时间太长
col_counts = partsDF.agg(*( partsDF.select(c).distinct().count() for c in partsDF.columns)).collect()[0].asDict()
#select the cols with count=1 in an array
cols_to_drop = [col for col in partsDF.columns if col_counts[col] == 1 ]