我下面有一个DataFrame,但是我需要根据已取消和订购的列从每个代码中选择行。
说代码xxx的顺序为[6、1、5、1],顺序为11。我需要一种算法,该算法可以选择满足总11个说行的顺序[6&5]的行,然后创建一个具有相应ID和每个代码的订单总数的新DataFrame,如下所示。
如果没有行匹配,则选择最接近的ID,并将其添加到列表中,其与取消的差异如下所示:111111是所选ID,而35是55与20之间的差异。我需要一种可以处理10k的算法行
**code** **canceled** **order** **ids**
xxx 11.0 13 [128281, 128283]
cvd 20 55 [111111, 35]
df = [
{"code":"xxx","canceled":11.0,"id":"128281","order":6},
{"code":"xxx","canceled":11.0,"id":"128282","order":1},
{"code":"xxx","canceled":11.0,"id":"128283","order":5},
{"code":"xxx","canceled":11.0,"id":"128284","order":1},
{"code":"xxS","canceled":0.0,"id":"108664","order":4},
{"code":"xxS","canceled":0.0,"id":"110515","order":1},
{"code":"xxS","canceled":0.0,"id":"113556","order":1},
{"code":"eeS","canceled":5.0,"id":"115236","order":1},
{"code":"eeS","canceled":5.0,"id":"108586","order":1},
{"code":"eeS","canceled":5.0,"id":"114107","order":1},
{"code":"eeS","canceled":5.0,"id":"113472","order":3},
{"code":"eeS","canceled":5.0,"id":"114109","order":3},
{"code":"544W","canceled":44.0,"id":"107650","order":20},
{"code":"544W","canceled":44.0,"id":"127763","order":4},
{"code":"544W","canceled":44.0,"id":"128014","order":20},
{"code":"544W","canceled":44.0,"id":"132434","order":58},
{"code":"cvd","canceled":20.0,"id":"11111","order":55}
]
我尝试了上一个项目中的解决方案,但是它没有用,尽管两者都解决了相同的问题,但是有人可以在这里帮助我吗。
from itertools import combinations
def combs(lst, n):
return (c for k in range(1, n+1) for c in combinations(lst, k))
def best_match(lst, target, n=20):
return min(combs(lst, n), key=lambda c: (abs(target - sum(c)), len(c)))
best_match(np.array(df['order']), np.array(df['canceled']))
# sorted Rows
sorted_rows = df.apply(best_match(np.array(CV['order']), np.array(CV['canceled'])))
答案 0 :(得分:1)
使用-
df = pd.DataFrame(df)
df_g = df.groupby('code').agg({'canceled': 'first', 'order': list})
def get_combo(x):
ind = x['canceled']
weight = x['order']
id_ = x['id']
cmb = []
ids = []
for x in range(1, len(weight) + 1):
cmb += itertools.combinations(weight, x)
ids += itertools.combinations(id_, x)
try:
indx = [sum(i) for i in cmb].index(ind)
return (cmb[indx], ids[indx])
except:
return ([],[])
t = df_g.apply(get_combo, axis=1).apply(pd.Series)
t.columns = ['combs', 'ids']
df_g = pd.concat([df_g, t[['ids']]], axis=1).drop('id', axis=1).reset_index()
df_g['order'] = df_g['order'].apply(sum)
print(df_g)
输出
code canceled order ids
0 544W 44.0 102 (107650, 127763, 128014)
1 cvd 20.0 55 []
2 eeS 5.0 9 (115236, 108586, 113472)
3 xxS 0.0 6 []
4 xxx 11.0 13 (128281, 128283)