我有一个数据帧,其中pic_code值可能重复。如果它重复,我想设置变量" keep"到" t"对于最接近其mpe_wgt的pic_code。
例如,第二个pic_code具有" keep"设置为t,因为它具有" weight"最接近其对应的" mpe_weight"。我的代码导致" keep"住在' f'为了所有人和"差异"住#" 100"为了所有人。
df['keep']='f'
df['diff']=100
def cln_df(data):
if pd.unique(data['mpe_wgt']).shape==(1,):
data['keep'][0:1]='t'
elif pd.unique(data['mpe_wgt']).shape!=(1,):
data['diff']=abs(data['weight']-(data['mpe_wgt']/100))
data['keep'][data['diff']==min(data['diff'])]='t'
return data
df=df.groupby('pic_code').apply(cln_df)
之前的
pic_code weight mpe_wgt keep diff
1234 45 34 f 100
1234 32 23 f 100
45344 54 35 f 100
234 76 98 f 100
234 65 12 f 100
df输出应为
pic_code weight mpe_wgt keep diff
1234 45 34 f 11
1234 32 23 t 9
45344 54 35 t 100
234 76 98 t 22
234 65 12 f 53
我对python很新,所以请尽量保持解决方案的简单性。我真的想让我的方法有效,所以请不要过于花哨。在此先感谢您的帮助。
答案 0 :(得分:6)
这是一种方式。注意我使用布尔值True
/ False
代替字符串"t"
和"f"
。这只是一种很好的做法。
请注意,以下所有操作都是矢量化的,而具有自定义功能的groupby.apply
肯定不是。
<强>设置强>
print(df)
pic_code weight mpe_wgt
0 1234 45 34
1 1234 32 23
2 45344 54 35
3 234 76 98
4 234 65 12
<强>解决方案强>
# calculate difference
df['diff'] = (df['weight'] - df['mpe_wgt']).abs()
# sort by pic_code, then by diff
df = df.sort_values(['pic_code', 'diff'])
# define keep column as True only for non-duplicates by pic_code
df['keep'] = ~df.duplicated('pic_code')
<强>结果强>
print(df)
pic_code weight mpe_wgt diff keep
3 234 76 98 22 True
4 234 65 12 53 False
1 1234 32 23 9 True
0 1234 45 34 11 False
2 45344 54 35 19 True
答案 1 :(得分:4)
使用:
df['keep'] = df.assign(closest=(df['mpe_wgt']-df['weight']).abs())\
.sort_values('closest').duplicated(subset=['pic_code'])\
.replace({True:'f',False:'t'})
输出:
pic_code weight mpe_wgt keep
0 1234 45 34 f
1 1234 32 23 t
2 45344 54 35 t
3 234 76 98 t
4 234 65 12 f
答案 2 :(得分:4)
也许你可以尝试cumcount
df['diff'] = (df['weight'] - df['mpe_wgt']).abs()
df['keep'] = df.sort_values('diff').groupby('pic_code').cumcount().eq(0)
df
pic_code weight mpe_wgt diff keep
0 1234 45 34 11 False
1 1234 32 23 9 True
2 45344 54 35 19 True
3 234 76 98 22 True
4 234 65 12 53 False
答案 3 :(得分:2)
使用static int n = 0;
public static string[] NoDuplicate(string[] array)
{
int i;
string[] res = (string[])array.Clone();
for (i = 0; i < array.Length-1; i++)
{
if (array[i + 1] != array[i])
res[n++] = (string)array[i];
}
return res;
}
和eval
执行与其他答案类似的逻辑。
assign