我在Pandas中有一个数据框,relation_between_countries:
country_from country_to points
1 Albania Austria 10
2 Denmark Austria 5
3 Austria Albania 2
4 Greece Norway 4
5 Norway Greece 5
我试图弄清楚关系点之间的区别,如下:
country_from_or_to country_to_or_from difference
Albania Austria 8
Denmark Austria
Greece Norway -1
你有什么想法怎么办?
答案 0 :(得分:5)
cols = ['country_from','country_to']
#sort values in columns
df[cols] = df[cols].apply(sorted, axis=1)
#get difference
df['difference'] = df.groupby(cols)['points'].diff(-1)
print (df)
country_from country_to points difference
1 Albania Austria 10 8.0
2 Austria Denmark 5 NaN
3 Albania Austria 2 NaN
4 Greece Norway 4 -1.0
5 Greece Norway 5 NaN
也可以替换NaN
来清空空格,但是在列中得到混合值 - 带字符串的数字,所以某些函数可以返回奇怪的输出:
cols = ['country_from','country_to']
df[cols] = df[cols].apply(sorted, axis=1)
df['difference'] = df.groupby(cols)['points'].diff(-1).fillna('')
print (df)
country_from country_to points difference
1 Albania Austria 10 8
2 Austria Denmark 5
3 Albania Austria 2
4 Greece Norway 4 -1
5 Greece Norway 5