Question

我有一个数据框，其中的列是每月数据，例如2000-01、2000-02 ... 2015-12。

我需要删除所有不是季度数据的列，即仅保留月份-01，-04，-07，-10月份，并删除所有其他每月数据

我尝试了df1 = df[df.columns[~df.columns.str.endswith('02')]]，它使我可以删除所有以02结尾的列。

但是我不想每个月重复此代码8次。有没有一种方法可以传递列表并使代码删除列表中以任何字符串结尾的所有列，例如

df1 = df[df.columns[~df.columns.str.endswith('02','03','05,'06')]]

Answer 1

要满足您的要求，您可以创建一个函数，该函数给出的列确定哪些是“季度”月。示例：

> def quarter_columns(cols):
      return [col for col in cols if int(col.split('-')[1]) % 3 == 1]
> df = pd.DataFrame({'2018-{0:02}'.format(month):[1] for month in range(1,8)})
> df
>       2018-01     2018-02     2018-03     2018-04     2018-05     2018-06     2018-07
    0   1           1           1           1           1           1           1
> df[quarter_columns(df.columns)]
>       2010-01     2010-04     2010-07
    0   1           1           1

另一种实现方法是创建一个函数，该函数多次应用“ endswith”函数。示例：

> def multiple_endswith(cols, strs):
      return np.logical_or.reduce([df.columns.str.endswith(st) for st in strs])
> df[df.columns[~multiple_endswith(df.columns, ['02', '03', '05', '06', '08', '09', '11', '12'])]]
>       2010-01     2010-04     2010-07
    0   1           1           1
> df[df.columns[multiple_strends(df.columns, ['01', '04', '07', '10'])]]
>       2010-01     2010-04     2010-07
    0   1           1           1

Answer 2

您可以使用布尔掩码：

numpy

如何将包含字符串一部分的所有列删除到列表中

2 个答案: