我已经收到一个熊猫数据框。 它充满了我要删除的不必要的功能。 目前,我正在执行以下操作,这很脏 我怎么能用更pythonic的方式得到这个?
features_to_include= mydf.columns.tolist()
features_to_include=[f for f in features_to_include if 'stopword1' not in f]
features_to_include=[f for f in features_to_include if 'stopwordN' not in f]
[...其中的90个之外]
features_to_include=[f for f in features_to_include if 'password1' in f]
features_to_include=[f for f in features_to_include if 'passwordN' in f]
[...其中的90个之外]
编辑:X.columns
中的'stopword1'和'password1'不是
X.columns
的示例名称可以是:feature99_stopword1
答案 0 :(得分:2)
我认为需要str.contains
:
L = ['stopword1','stopwordN','password1', 'passwordN']
#thanks roganjosh for suggestion
L = set(['stopword1','stopwordN','password1', 'passwordN'])
mydf = mydf.loc[:, mydf.columns.str.contains('|'.join(L))]
示例:
mydf = pd.DataFrame({'feature99_stopword1':list('abcdef'),
'feature99_stopword':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'd_stopword1':[1,3,5,7,1,0],
'password1':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (mydf)
feature99_stopword1 feature99_stopword C d_stopword1 password1 F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
L = ['stopword1','stopwordN','password1', 'passwordN']
mydf = mydf.loc[:, mydf.columns.str.contains('|'.join(L))]
print (mydf)
feature99_stopword1 d_stopword1 password1
0 a 1 5
1 b 3 3
2 c 5 6
3 d 7 9
4 e 1 2
5 f 0 4
答案 1 :(得分:1)
您可以尝试使用filter
:
df.filter(regex='password|stopword1', axis=1)
或者如果我们有一个列表:
cols = ['password','passwordN','stopword1','stopwordN']
mydf.filter(regex='|'.join(cols), axis=1)