Question

我有一个数据框；如下图

id   1id  id2  ac1  2ac tre tye

我要删除其中包含“ id”和“ ac”的列，并保留其他列

我如何在pyspark中实现这一目标

尝试过的“选择语句”无效

如何在此处对列名称使用regexep？

Answer 1

使用简单的列表理解：

使用选择

df.select(*[col(c) for c in df.columns if not("id" in c or "ac" in c)]).show()

使用Drop

df.drop(*[c for c in df.columns if "id" in c or "ac" in c]).show()