是否有一种简单的方法可以根据Pandas Python中的另一个列从一列中选择值

时间:2019-11-16 17:55:22

标签: python pandas

我下面有一个DataFrame,但是我需要根据已取消和订购的列从每个代码中选择行。

说代码xxx的顺序为[6、1、5、1],顺序为11。我需要一种算法,该算法可以选择满足总11个说行的顺序[6&5]的行,然后创建一个具有相应ID和每个代码的订单总数的新DataFrame,如下所示。

如果没有行匹配,则选择最接近的ID,并将其添加到列表中,其与取消的差异如下所示:111111是所选ID,而35是55与20之间的差异。我需要一种可以处理10k的算法行

     **code**          **canceled**      **order**       **ids**
        xxx                 11.0            13     [128281, 128283]
        cvd                 20             55     [111111, 35]

df = [
    {"code":"xxx","canceled":11.0,"id":"128281","order":6},
    {"code":"xxx","canceled":11.0,"id":"128282","order":1},
    {"code":"xxx","canceled":11.0,"id":"128283","order":5},
    {"code":"xxx","canceled":11.0,"id":"128284","order":1},
    {"code":"xxS","canceled":0.0,"id":"108664","order":4},
    {"code":"xxS","canceled":0.0,"id":"110515","order":1},
    {"code":"xxS","canceled":0.0,"id":"113556","order":1},
    {"code":"eeS","canceled":5.0,"id":"115236","order":1},
    {"code":"eeS","canceled":5.0,"id":"108586","order":1},
    {"code":"eeS","canceled":5.0,"id":"114107","order":1},
    {"code":"eeS","canceled":5.0,"id":"113472","order":3},
    {"code":"eeS","canceled":5.0,"id":"114109","order":3},
    {"code":"544W","canceled":44.0,"id":"107650","order":20},
    {"code":"544W","canceled":44.0,"id":"127763","order":4},
    {"code":"544W","canceled":44.0,"id":"128014","order":20},
    {"code":"544W","canceled":44.0,"id":"132434","order":58},
    {"code":"cvd","canceled":20.0,"id":"11111","order":55}
]

我尝试了上一个项目中的解决方案,但是它没有用,尽管两者都解决了相同的问题,但是有人可以在这里帮助我吗。

from itertools import combinations    
def combs(lst, n):
    return (c for k in range(1, n+1) for c in combinations(lst, k))

def best_match(lst, target, n=20):
    return min(combs(lst, n), key=lambda c: (abs(target - sum(c)), len(c)))

best_match(np.array(df['order']), np.array(df['canceled']))

# sorted Rows
sorted_rows = df.apply(best_match(np.array(CV['order']), np.array(CV['canceled'])))

1 个答案:

答案 0 :(得分:1)

使用-

df = pd.DataFrame(df)
df_g = df.groupby('code').agg({'canceled': 'first', 'order': list})

def get_combo(x):
    ind = x['canceled']
    weight = x['order']
    id_ = x['id']
    cmb = []
    ids = []
    for x in range(1, len(weight) + 1):
        cmb += itertools.combinations(weight, x)
        ids += itertools.combinations(id_, x)
    try:
        indx = [sum(i) for i in cmb].index(ind)
        return (cmb[indx], ids[indx])
    except:
        return ([],[])

t = df_g.apply(get_combo, axis=1).apply(pd.Series)
t.columns = ['combs', 'ids']
df_g = pd.concat([df_g, t[['ids']]], axis=1).drop('id', axis=1).reset_index()
df_g['order'] = df_g['order'].apply(sum)
print(df_g)

输出

   code  canceled  order                       ids
0  544W      44.0    102  (107650, 127763, 128014)
1   cvd      20.0     55                        []
2   eeS       5.0      9  (115236, 108586, 113472)
3   xxS       0.0      6                        []
4   xxx      11.0     13          (128281, 128283)