重新排序两个pandas DataFrame列中的选择项

时间:2017-09-15 03:09:35

标签: python pandas dataframe boolean

我有一个包含三列的pandas DataFrame,部分如下所示:

data = {'T1': {0: 'Belarus', 1: 'Netherlands', 2: 'France', 3: 'Faroe Islands', 
        4: 'Hungary'}, 'T2': {0: 'Sweden', 1: 'Bulgaria', 2: 'Luxembourg', 
        3: 'Andorra', 4: 'Portugal'}, 'score': {0: -4, 1: 2, 2: 0, 3: 1, 4: -1}}
df = pd.DataFrame(data)
#           T1             t2  score
#0        Belarus      Sweden     -4
#1    Netherlands    Bulgaria      2
#2         France  Luxembourg      0
#3  Faroe Islands     Andorra      1
#4        Hungary    Portugal     -1

对于项T1T2不在字母顺序中的任何行(例如"Netherlands""Bulgaria"),我想要交换项目并更改score的标志。

我能够想出一个怪物:

df.apply(lambda x: 
          pd.Series([x["T2"], x["T1"], -x["score"]]) 
          if (x["T1"] > x["T2"]) 
          else pd.Series([x["T1"], x["T2"], x["score"]]), 
         axis=1)
#          0              1  2
#0   Belarus         Sweden -4
#1  Bulgaria    Netherlands -2
#2    France     Luxembourg  0
#3   Andorra  Faroe Islands -1
#4   Hungary       Portugal -1

有没有更好的方法来获得相同的结果? (表现不是问题。)

4 个答案:

答案 0 :(得分:4)

不像@cᴏʟᴅsᴘᴇᴇᴅ那样整洁,但是工作

df1=df[['T1','T2']]
df1.values.sort(1)
df1['new']=np.where((df1!=df[['T1','T2']]).any(1),-df.score,df.score)

df1
Out[102]: 
         T1             T2  new
0   Belarus         Sweden   -4
1  Bulgaria    Netherlands   -2
2    France     Luxembourg    0
3   Andorra  Faroe Islands   -1
4   Hungary       Portugal   -1

答案 1 :(得分:3)

选项1
布尔索引。

m = df.T1 > df.T2
m 

0    False
1     True
2    False
3     True
4    False
dtype: bool

df.loc[m, 'score'] = df.loc[m, 'score'].mul(-1)
df.loc[m, ['T1', 'T2']] = df.loc[m, ['T2', 'T1']].values
df

         T1             T2  score
0   Belarus         Sweden     -4
1  Bulgaria    Netherlands     -2
2    France     Luxembourg      0
3   Andorra  Faroe Islands     -1
4   Hungary       Portugal     -1

选项2
df.eval

m = df.eval('T1 > T2')
df.loc[m, 'score'] = df.loc[m, 'score'].mul(-1)
df.loc[m, ['T1', 'T2']] = df.loc[m, ['T2', 'T1']].values
df

         T1             T2  score
0   Belarus         Sweden     -4
1  Bulgaria    Netherlands     -2
2    France     Luxembourg      0
3   Andorra  Faroe Islands     -1
4   Hungary       Portugal     -1

选项3
df.query

idx = df.query('T1 > T2').index
idx
Int64Index([1, 3], dtype='int64')

df.loc[idx, 'score'] = df.loc[idx, 'score'].mul(-1)
df.loc[idx, ['T1', 'T2']] = df.loc[idx, ['T2', 'T1']].values
df

         T1             T2  score
0   Belarus         Sweden     -4
1  Bulgaria    Netherlands     -2
2    France     Luxembourg      0
3   Andorra  Faroe Islands     -1
4   Hungary       Portugal     -1

答案 2 :(得分:3)

这是一种使用numpy工具的有趣和创造性的方式

t = df[['T1', 'T2']].values
a = t.argsort(1)

df[['T1', 'T2']] = t[np.arange(len(t))[:, None], a]
# @ is python 3.5 thx @cᴏʟᴅsᴘᴇᴇᴅ
# otherwise use
# df['score'] *= a.dot([-1, 1])
df['score'] *= a @ [-1, 1]

df

         T1             T2  score
0   Belarus         Sweden     -4
1  Bulgaria    Netherlands     -2
2    France     Luxembourg      0
3   Andorra  Faroe Islands     -1
4   Hungary       Portugal     -1

答案 3 :(得分:2)

使用loc

cond = df.T1 > df.T2
df.loc[cond, 'score'] = df['score'] *-1
df.loc[cond, ['T1', 'T2']] = df.loc[cond, ['T2', 'T1']].values

输出

    T1          T2              score
0   Belarus     Sweden          -4
1   Bulgaria    Netherlands     -2
2   France      Luxembourg       0
3   Andorra     Faroe Islands   -1
4   Hungary     Portugal        -1