如何相对于另一个表隐藏/排除数据框列

时间:2018-02-01 06:48:22

标签: python apache-spark dataframe pyspark

我要求隐藏与另一个数据框/表相关的列,其中包含需要隐藏的列列表。

DF

+------+----------+-------+---+---+------+----+----+----+
|Gender| Mobilenum|address|age| id|id_row|name|role|unit|
+------+----------+-------+---+---+------+----+----+----+
|     M|  96226126| SDF-03| 24|101|     1| ash| SSE| DNA|
|     M| 961267126| DSR-09| 24|102|     2|sony|  TA| DNA|
|     M|  96226126| DDD-09| 24|103|     3|zoro|  PM| DNA|
|     M|3962267126| DFG-07| 24|104|     4| max|  SM| DNA|
|     M| 902267126| ASC-09| 24|105|     5| ben| VPM| DNA|
+------+----------+-------+---+---+------+----+----+----+

df_col

+---------+
|   column|
+---------+
|   id_row|
|Mobilenum|
|  address|
|      age|
|   Gender|
+---------+

这里我需要隐藏df中相对于df_col

的列

预期产出

+---+----+----+----+
| id|name|role|unit|
+---+----+----+----+
|101| ash| SSE| DNA|
|102|sony|  TA| DNA|
|103|zoro|  PM| DNA|
|104| max|  SM| DNA|
|105| ben| VPM| DNA|
+---+----+----+----+

1 个答案:

答案 0 :(得分:1)

请尝试以下代码。

c1_L = df_col.rdd.collect()
c1_L1 = [x.column for x in c1_L]
c_L = df.columns
final_df = df.select([x for x in c_L if x not in c1_L1])
final_df.show()

输出

+---+----+----+----+
| id|name|role|unit|
+---+----+----+----+
|101| ash| SSE| DNA|
|102|sony|  TA| DNA|
|103|zoro|  PM| DNA|
|104| max|  SM| DNA|
|105| ben| VPM| DNA|
+---+----+----+----+