假设我有一个这样的数据框:
Date Artist percent_gray percent_blue percent_black percent_red
33 Leonardo 22 33 36 46
45 Leonardo 23 47 23 14
46 Leonardo 13 34 33 12
23 Michelangelo 28 19 38 25
25 Michelangelo 24 56 55 13
26 Michelangelo 21 22 45 13
13 Titian 24 17 23 22
16 Titian 45 43 44 13
19 Titian 17 45 56 13
24 Raphael 34 34 34 45
27 Raphael 31 22 25 67
我想为同一位艺术家获得不同图片的最大色差。我也可以将percent_gray
与percent_blue
进行比较,例如对于Lenoardo来说,最大的区别是percent_red
(date:46)
- percent_blue(date:45) =
12 - 47 = -35。我想看看它是如何随着时间的推移而发展的,所以我只是想比较同一艺术家的新图片和旧图片(在这种情况下,我可以比较第三张图片与第一张和第二张图片,第二张图片只与第一张图片比较)并得到最大差异。所以数据框应该看起来像
Date Artist max_d
33 Leonardo NaN
45 Leonardo -32
46 Leonardo -35
23 Michelangelo NaN
25 Michelangelo 37
26 Michelangelo -43
13 Titian NaN
16 Titian 28
19 Titian 43
24 Raphael NaN
27 Raphael 33
我想我必须使用groupby,但无法获得我想要的输出。
答案 0 :(得分:2)
您可以使用:
#first sort in real data
df = df.sort_values(['Artist', 'Date'])
mi = df.iloc[:,2:].min(axis=1)
ma = df.iloc[:,2:].max(axis=1)
ma1 = ma.groupby(df['Artist']).shift()
mi1 = mi.groupby(df['Artist']).shift()
mad1 = mi - ma1
mad2 = ma - mi1
df['max_d'] = np.where(mad1.abs() > mad2.abs(), mad1, mad2)
print (df)
Date Artist percent_gray percent_blue percent_black \
0 33 Leonardo 22 33 36
1 45 Leonardo 23 47 23
2 46 Leonardo 13 34 33
3 23 Michelangelo 28 19 38
4 25 Michelangelo 24 56 55
5 26 Michelangelo 21 22 45
6 13 Titian 24 17 23
7 16 Titian 45 43 44
8 19 Titian 17 45 56
9 24 Raphael 34 34 34
10 27 Raphael 31 22 25
percent_red max_d
0 46 NaN
1 14 -32.0
2 12 -35.0
3 25 NaN
4 13 37.0
5 13 -43.0
6 22 NaN
7 13 28.0
8 13 43.0
9 45 NaN
10 67 33.0
说明(使用新列):
#get min and max per rows
df['min'] = df.iloc[:,2:].min(axis=1)
df['max'] = df.iloc[:,2:].max(axis=1)
#get shifted min and max by Artist
df['max1'] = df.groupby('Artist')['max'].shift()
df['min1'] = df.groupby('Artist')['min'].shift()
#get differences
df['max_d1'] = df['min'] - df['max1']
df['max_d2'] = df['max'] - df['min1']
#if else of absolute values
df['max_d'] = np.where(df['max_d1'].abs() > df['max_d2'].abs(), df['max_d1'], df['max_d2'])
print (df)
percent_red min max max1 min1 max_d1 max_d2 max_d
0 46 22 46 NaN NaN NaN NaN NaN
1 14 14 47 46.0 22.0 -32.0 25.0 -32.0
2 12 12 34 47.0 14.0 -35.0 20.0 -35.0
3 25 19 38 NaN NaN NaN NaN NaN
4 13 13 56 38.0 19.0 -25.0 37.0 37.0
5 13 13 45 56.0 13.0 -43.0 32.0 -43.0
6 22 17 24 NaN NaN NaN NaN NaN
7 13 13 45 24.0 17.0 -11.0 28.0 28.0
8 13 13 56 45.0 13.0 -32.0 43.0 43.0
9 45 34 45 NaN NaN NaN NaN NaN
10 67 22 67 45.0 34.0 -23.0 33.0 33.0
如果使用第二种解释方案,请删除列:
df = df.drop(['min','max','max1','min1','max_d1', 'max_d2'], axis=1)
print (df)
Date Artist percent_gray percent_blue percent_black \
0 33 Leonardo 22 33 36
1 45 Leonardo 23 47 23
2 46 Leonardo 13 34 33
3 23 Michelangelo 28 19 38
4 25 Michelangelo 24 56 55
5 26 Michelangelo 21 22 45
6 13 Titian 24 17 23
7 16 Titian 45 43 44
8 19 Titian 17 45 56
9 24 Raphael 34 34 34
10 27 Raphael 31 22 25
percent_red max_d
0 46 NaN
1 14 -32.0
2 12 -35.0
3 25 NaN
4 13 37.0
5 13 -43.0
6 22 NaN
7 13 28.0
8 13 43.0
9 45 NaN
10 67 33.0
答案 1 :(得分:1)
自定义应用功能如何?这有用吗?
first_value = getattr(aircraft_to_compare[0], key)
输出:
def aircraft_delta(request):
ids = [id for id in request.GET.get('ids') if id != ',']
aircraft_to_compare = Aircraft.objects.filter(id__in=ids)
property_keys = ['name', 'manufacturer', 'aircraft_type', 'body', 'engines',
'image', 'cost','maximum_range','passengers','maximum_altitude','cruising_speed',
'fuel_capacity','description','wing_span','length']
column_descriptions = {
'image': '',
'name': 'Aircraft',
'maximum_range': 'Range (NM)',
'passengers': 'Passengers',
'cruising_speed': 'Max Speed (kts)',
'fuel_capacity': 'Fuel Capacity',
'aircraft_type': 'Type',
'body':'Body',
'engines':'Engines',
'cost':'Cost',
'maximum_altitude':'Maximum Altitude',
'description':'Description',
'manufacturer':'Manufacturer',
'wing_span':'Wing Span (FT)',
'length':'Total Length (FT)'
}
data = []
for key in property_keys:
row = [column_descriptions[key]]
first_value = getattr(aircraft_to_compare[0], key)
second_value = getattr(aircraft_to_compare[1], key)
if key not in ['image', 'name']:
delta = abs(first_value - second_value)
else:
delta = ''
row.append(first_value)
row.append(delta)
row.append(second_value)
data.append(row)
return render(request, 'aircraft/aircraft_delta.html', {
'data': data
})