I have the following python dataframe
Variable_1 Variable_2 Variable_3 Target
G M I 230
G M I 231
G M I 233
G M I 231
G M I 230
G M I 214
G M L 211
G M L 212
G M L 123
G M L 345
G N J 32
G N J 123
G N J 234
G N O 2345
G N O 432
G N O 455
G N O 543
G N O 333
Let's consider only Variable_3. For each category of Variable_3 I want to compare the last of that Target against the first value of the Target. For example:
From the example above, I would like my resulting dataset to look like this:
Variable_1 Variable_2 Variable_3 Target Output
G M I 230 -1
G M I 231 -1
G M I 233 -1
G M I 231 -1
G M I 230 -1
G M I 214 -1
G M L 211 1
G M L 212 1
G M L 123 1
G M L 345 1
G N J 32 1
G N J 123 1
G N J 234 1
G N O 2345 -1
G N O 432 -1
G N O 455 -1
G N O 543 -1
G N O 333 -1
答案 0 :(得分:1)
通过Variable_3对数据进行分组,并在每个组中找到第一个和最后一个Target。比较它们:
groups = df.groupby('Variable_3')['Target']
output = groups.first() > groups.last()
基于Variable_3作为索引,将输出与旧数据框合并在一起:
df = df.set_index('Variable_3').join(output, rsuffix='_r').reset_index()
将布尔值转换为1s和-1s:
import numpy as np
df['Target_r'] = np.where(df['Target_r'], -1, 1)
最后,更改新的列名:
df.rename(columns={'Target_r' : 'Output'}, inplace=True)
答案 1 :(得分:1)
尝试:
df.loc[:, 'Output'] = df.groupby('Variable_3')['Target']\
.transform(lambda x: -1 if x.iloc[-1] > x.iloc[0] else 1)