Question

例如，在Loc vs. iloc vs. ix vs. at vs. iat?的答案中取DataFrame。

df = pd.DataFrame(
{'age':[30, 2, 12, 4, 32, 33, 69],
 'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
 'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
 'height':[165, 70, 120, 80, 180, 172, 150],
 'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
 'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']},
 index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia']
)

现在我想要除'food'和'height'之外的所有列。

我认为df.loc[:,['age':'color', 'score':'state']]之类的东西可行，但Python返回SyntaxError: invalid syntax。

我知道有一种方法可以解决：df.drop(columns = ['food', 'height'])。但是，在我的现实生活中，我有数百列要删除。输入所有列名称是非常低效的。

我希望R语言中的dplyr::select(df, -(food:height))或dplyr::select(df, age:color, score:state)类似。

同时阅读Selecting/Excluding sets of columns in Pandas。

Answer 1

首先，查找位于food和height（包括）之间的所有列。

c = df.iloc[-1:0].loc[:, 'food':'height'].columns

接下来，使用difference / isin / setdiff1d -

进行过滤

df[df.columns.difference(c)]

或者，

df.loc[:, ~df.columns.isin(c)]

或者，

df[np.setdiff1d(df.columns, c)]

           age  color  score state
Jane        30   blue    4.6    NY
Nick         2  green    8.3    TX
Aaron       12    red    9.0    FL
Penelope     4  white    3.3    AL
Dean        32   gray    1.8    AK
Christina   33  black    9.5    TX
Cornelia    69    red    2.2    TX

Answer 2

首先按Index.get_loc获取列名称的位置，然后使用numpy.r_将所有切片器连接在一起：

a = np.r_[df.columns.get_loc('age'):df.columns.get_loc('color')+1, 
          df.columns.get_loc('score'):df.columns.get_loc('state')+1]

df = df.iloc[:, a]
print (df)
           age  color  score state
Jane        30   blue    4.6    NY
Nick         2  green    8.3    TX
Aaron       12    red    9.0    FL
Penelope     4  white    3.3    AL
Dean        32   gray    1.8    AK
Christina   33  black    9.5    TX
Cornelia    69    red    2.2    TX

在Pandas

2 个答案: