Question

我有一个熊猫数据框，其中包含近1000列。我想删除名称以tran，can，cad开头的列。有人可以帮忙吗。

Answer 1

将str.startswith，Series.str.lower与DataFrame.loc和boolean indexing，~一起用于反转布尔掩码：

np.random.seed(100)
c = ['Tran1','t tran','aaa','can','Cad14']
df = pd.DataFrame(np.random.randint(10, size=(5,5)), columns=c)
print (df)
   Tran1  t tran  aaa  can  Cad14
0      8       8    3    7      7
1      0       4    2    5      2
2      2       2    1    0      8
3      4       0    9    6      2
4      4       1    5    3      4

mask = df.columns.str.lower().str.startswith(('tran','can','cad'))
#another solution
#mask = df.columns.str.contains('^tran|^can|^cad', case=False)
print (mask)
[ True False False  True  True]

print (~mask)
[False  True  True False False]

df1 = df.loc[:, ~mask]
print (df1)
   t tran  aaa
0       8    3
1       4    2
2       2    1
3       0    9
4       1    5

Answer 2

只需调整正则表达式以区分大小写或任何其他需求

import re

col_reg = "tran|can|cad"
df = df.drop([x for x in df.columns if re.search(col_reg,x)],axis=1)

通过传递关键字从熊猫数据框中删除多列

2 个答案: