从pyspark数据帧中获取列,其中值等于0

时间:2017-06-02 18:23:23

标签: pyspark spark-dataframe

我正在尝试从列值为0但不能这样做的数据帧中获取列。有没有人尝试过同样的事情?

1 个答案:

答案 0 :(得分:0)

## Data Frame with One Row
row = [[1,0,0,1,2,3,4,0,0,0]]
df = sc.parallelize(row).toDF(['Col1','Col2','Col3','Col4','Col5','Col6','Col7','Col8','Col9','Col10'])
df.show()

#Say you have only one row hene we wrote that zero
list_of_dict = map(lambda row: row.asDict(), df.collect())[0]

zeroCol = []
for key in list_of_dict.keys():
    if list_of_dict[key] > 0:
        zeroCol.append(key)

print zeroCol

+----+----+----+----+----+----+----+----+----+-----+
|Col1|Col2|Col3|Col4|Col5|Col6|Col7|Col8|Col9|Col10|
+----+----+----+----+----+----+----+----+----+-----+
|   1|   0|   0|   1|   2|   3|   4|   0|   0|    0|
+----+----+----+----+----+----+----+----+----+-----+

['Col6', 'Col7', 'Col4', 'Col5', 'Col1']