Question

我正在尝试删除数据集中的一组列。要删除的所有变量都以文本“prefix”结尾。

我确实设法使用以下方法将它们“收集”到一个组中：

然后尝试了一系列方法来删除导致各种错误的组。任何人都可以，提出一种删除这些列的方法吗？

Answer 1

df2 = df.ix[:, ~df.columns.str.endswith('prefix')]

Answer 2

使用filter和regex

df.filter(regex=r'^((?!prefix).)*$')

演示

df = pd.DataFrame(np.random.rand(2, 6),
                  columns=['oneprefix', 'one',
                           'twoprefix', 'two',
                           'threeprefix', 'three'])

df.filter(regex=r'^((?!prefix).)*$')

其中：

df

时序

一切都差不多

Answer 3

为了完整起见：

In [306]: df
Out[306]:
   prefixcol1  col2prefix  col3prefix  colN
0           1           1           1     1
1           2           2           2     2
2           3           3           3     3

In [307]: df.loc[:, ~df.columns.str.contains('prefix$')]
Out[307]:
   prefixcol1  colN
0           1     1
1           2     2
2           3     3

或其他变体：

In [388]: df.select(lambda x: re.search(r'prefix$', str(x)) is None, axis=1)
Out[388]:
   prefixcol1  colN
0           1     1
1           2     2
2           3     3

Answer 4

我认为你需要：

not_prefix_cols= [col for col in df.columns if not 'prefix' in col]
df2[not_prefix_cols]

但更好的是使用：

prefix_cols= [col for col in df.columns if not col.endswith('prefix')]
print (df[prefix_cols])

样品：

import pandas as pd

df = pd.DataFrame({'prefixone' : pd.Series([1, 2, 3, 4]),
                   'twoprefix' : pd.Series([20, 30, 40, 50]),
                   'two1prefix' : pd.Series([20, 30, 40, 50])})

print (df)
   prefixone  two1prefix  twoprefix
0          1          20         20
1          2          30         30
2          3          40         40
3          4          50         50

prefix_cols= [col for col in df.columns if not col.endswith('prefix')]
print (df[prefix_cols])
   prefixone
0          1
1          2
2          3
3          4

Answer 5

df2 = df.drop([col for col in df.columns if 'prefix' in col],axis=1)

如何删除Pandas中以相同文本结尾的多个列？

5 个答案:

演示

时序