我有一个包含三列的pandas DataFrame,部分如下所示:
data = {'T1': {0: 'Belarus', 1: 'Netherlands', 2: 'France', 3: 'Faroe Islands',
4: 'Hungary'}, 'T2': {0: 'Sweden', 1: 'Bulgaria', 2: 'Luxembourg',
3: 'Andorra', 4: 'Portugal'}, 'score': {0: -4, 1: 2, 2: 0, 3: 1, 4: -1}}
df = pd.DataFrame(data)
# T1 t2 score
#0 Belarus Sweden -4
#1 Netherlands Bulgaria 2
#2 France Luxembourg 0
#3 Faroe Islands Andorra 1
#4 Hungary Portugal -1
对于项T1
和T2
不在字母顺序中的任何行(例如"Netherlands"
和"Bulgaria"
),我想要交换项目并更改score
的标志。
我能够想出一个怪物:
df.apply(lambda x:
pd.Series([x["T2"], x["T1"], -x["score"]])
if (x["T1"] > x["T2"])
else pd.Series([x["T1"], x["T2"], x["score"]]),
axis=1)
# 0 1 2
#0 Belarus Sweden -4
#1 Bulgaria Netherlands -2
#2 France Luxembourg 0
#3 Andorra Faroe Islands -1
#4 Hungary Portugal -1
有没有更好的方法来获得相同的结果? (表现不是问题。)
答案 0 :(得分:4)
不像@cᴏʟᴅsᴘᴇᴇᴅ那样整洁,但是工作
df1=df[['T1','T2']]
df1.values.sort(1)
df1['new']=np.where((df1!=df[['T1','T2']]).any(1),-df.score,df.score)
df1
Out[102]:
T1 T2 new
0 Belarus Sweden -4
1 Bulgaria Netherlands -2
2 France Luxembourg 0
3 Andorra Faroe Islands -1
4 Hungary Portugal -1
答案 1 :(得分:3)
选项1
布尔索引。
m = df.T1 > df.T2
m
0 False
1 True
2 False
3 True
4 False
dtype: bool
df.loc[m, 'score'] = df.loc[m, 'score'].mul(-1)
df.loc[m, ['T1', 'T2']] = df.loc[m, ['T2', 'T1']].values
df
T1 T2 score
0 Belarus Sweden -4
1 Bulgaria Netherlands -2
2 France Luxembourg 0
3 Andorra Faroe Islands -1
4 Hungary Portugal -1
选项2
df.eval
m = df.eval('T1 > T2')
df.loc[m, 'score'] = df.loc[m, 'score'].mul(-1)
df.loc[m, ['T1', 'T2']] = df.loc[m, ['T2', 'T1']].values
df
T1 T2 score
0 Belarus Sweden -4
1 Bulgaria Netherlands -2
2 France Luxembourg 0
3 Andorra Faroe Islands -1
4 Hungary Portugal -1
选项3
df.query
idx = df.query('T1 > T2').index
idx
Int64Index([1, 3], dtype='int64')
df.loc[idx, 'score'] = df.loc[idx, 'score'].mul(-1)
df.loc[idx, ['T1', 'T2']] = df.loc[idx, ['T2', 'T1']].values
df
T1 T2 score
0 Belarus Sweden -4
1 Bulgaria Netherlands -2
2 France Luxembourg 0
3 Andorra Faroe Islands -1
4 Hungary Portugal -1
答案 2 :(得分:3)
这是一种使用numpy工具的有趣和创造性的方式
t = df[['T1', 'T2']].values
a = t.argsort(1)
df[['T1', 'T2']] = t[np.arange(len(t))[:, None], a]
# @ is python 3.5 thx @cᴏʟᴅsᴘᴇᴇᴅ
# otherwise use
# df['score'] *= a.dot([-1, 1])
df['score'] *= a @ [-1, 1]
df
T1 T2 score
0 Belarus Sweden -4
1 Bulgaria Netherlands -2
2 France Luxembourg 0
3 Andorra Faroe Islands -1
4 Hungary Portugal -1
答案 3 :(得分:2)
使用loc
cond = df.T1 > df.T2
df.loc[cond, 'score'] = df['score'] *-1
df.loc[cond, ['T1', 'T2']] = df.loc[cond, ['T2', 'T1']].values
输出
T1 T2 score
0 Belarus Sweden -4
1 Bulgaria Netherlands -2
2 France Luxembourg 0
3 Andorra Faroe Islands -1
4 Hungary Portugal -1