重新分配唯一值-pandas DataFrame

时间:2019-01-08 23:57:07

标签: python pandas numpy dataframe assign

我正在尝试assign unique中的pandas df中的值给特定的个人。

对于下面的df[Area][Place]将共同构成unique值,这些值是各种 job 。这些值将分配给个人,其总体目标是使用尽可能少的个人。

诀窍是这些值会不断地开始和结束,并持续不同的时间长度。一次分配给个人的最多unique个值是 3 [On]显示[Place]和[Area]当前有多少个唯一值。

因此,这为我需要多少个人提供了具体的指导。例如3个unique值一个= 1人,六个唯一值on = 2人

我无法执行groupby语句,其中我assign的第一个3 unique valuesindividual 1,接下来的三个unique的值{{1} }等。

我的设想是,当individual 2的值大于3时,我想先对unique中的值进行分组,然后将剩余部分合并。因此,请向个人(最多3个)[Area]中的assign相同值。然后,如果存在[Area]个值(<3),则应将它们组合成3的组,在可能的情况下。

我设想的工作方式是:_leftover_ 展望未来。对于每个新的hour值,row应该看到多少个值script(这表明需要多少个人)。如果[On]的值> 3,则它们应为unique的{​​{1}}与assigned中的相同值。如果有剩余个值,则应将它们组合起来以组成3个一组。

对于下面的grouping,在[Area]df出现的unique值的数量在1-6之间变化。因此,我们不应有超过2个人[Place]。当[Area]的值> 3时,应首先由assigned分配。 剩余值应与值小于3 unique的其他个人组合。

对大型df的致歉。这是我可以复制问题的唯一方法!

[Area]

输出:

unique

预期的输出和对我认为应该分配它的原因的评论:

enter image description here

1 个答案:

答案 0 :(得分:5)

There's a live version of this answer online that you can try for yourself.

问题

您看到的错误是由于(还有另一个)有趣的问题边缘情况造成的。在6th作业期间,代码将person 2分配给(A, House 4)。然后,它会看到一个区域A在一个小时内重复出现,因此它将person 2保留在该区域中。这使得person 2无法用于下一个作业,该作业位于区域B中。

但是,由于person 2区域和位置的独特结合,因此A内的工作(A, House 1)并没有理由保留在(A, House 1)中已经分配给person 1

解决方案

在决定何时将一个人抱在一个区域中时,可以通过仅考虑区域和位置的唯一组合来解决该问题。只需更改几行代码即可。

首先,我们构造一个与唯一(区域,地点)对相对应的区域列表:

unqareas = df[['Area', 'Place']].drop_duplicates()['Area'].values

然后,我们在标识保全的代码的第一行中将unqareas替换为areas

ixrep = np.argmax(np.triu(unqareas.reshape(-1, 1)==unqareas, k=1), axis=1)

完整的列表/测试

import pandas as pd
import numpy as np
from collections import Counter

d = ({
     'Time' : ['8:03:00','8:07:00','8:10:00','8:23:00','8:27:00','8:30:00','8:37:00','8:40:00','8:48:00'],
     'Place' : ['House 1','House 2','House 3','House 1','House 2','House 3','House 4','House 1','House 1'],
     'Area' : ['A','A','A','A','A','A','A','B','A'],
     'Person' : ['Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 2','Person 3','Person 1'],
     'On' : ['1','2','3','3','3','3','4','5','5']
     })

df = pd.DataFrame(data=d)

def getAssignedPeople(df, areasPerPerson):
    areas = df['Area'].values
    unqareas = df[['Area', 'Place']].drop_duplicates()['Area'].values
    places = df['Place'].values
    times = pd.to_datetime(df['Time']).values

    maxPerson = np.ceil(areas.size / float(areasPerPerson)) - 1
    assignmentCount = Counter()
    assignedPeople = []
    assignedPlaces = {}
    heldPeople = {}
    heldAreas = {}
    holdAvailable = True
    person = 0

    # search for repeated areas. Mark them if the next repeat occurs within an hour
    ixrep = np.argmax(np.triu(unqareas.reshape(-1, 1)==unqareas, k=1), axis=1)
    holds = np.zeros(areas.size, dtype=bool)
    holds[ixrep.nonzero()] = (times[ixrep[ixrep.nonzero()]] - times[ixrep.nonzero()]) < np.timedelta64(1, 'h')

    for area,place,hold in zip(areas, places, holds):
        if (area, place) in assignedPlaces:
            # this unique (area, place) has already been assigned to someone
            assignedPeople.append(assignedPlaces[(area, place)])
            continue

        if assignmentCount[person] >= areasPerPerson:
            # the current person is already assigned to enough areas, move on to the next
            a = heldPeople.pop(person, None)
            heldAreas.pop(a, None)
            person += 1

        if area in heldAreas:
            # assign to the person held in this area
            p = heldAreas.pop(area)
            heldPeople.pop(p)
        else:
            # get the first non-held person. If we need to hold in this area, 
            # also make sure the person has at least 2 free assignment slots,
            # though if it's the last person assign to them anyway 
            p = person
            while p in heldPeople or (hold and holdAvailable and (areasPerPerson - assignmentCount[p] < 2)) and not p==maxPerson:
                p += 1

        assignmentCount.update([p])
        assignedPlaces[(area, place)] = p
        assignedPeople.append(p)

        if hold:
            if p==maxPerson:
                # mark that there are no more people available to perform holds
                holdAvailable = False

            # this area recurrs in an hour, mark that the person should be held here
            heldPeople[p] = area
            heldAreas[area] = p

    return assignedPeople

def allocatePeople(df, areasPerPerson=3):
    assignedPeople = getAssignedPeople(df, areasPerPerson=areasPerPerson)
    df = df.copy()
    df.loc[:,'Person'] = df['Person'].unique()[assignedPeople]
    return df

print(allocatePeople(df))

输出:

      Time    Place Area    Person On
0  8:03:00  House 1    A  Person 1  1
1  8:07:00  House 2    A  Person 1  2
2  8:10:00  House 3    A  Person 1  3
3  8:23:00  House 1    A  Person 1  3
4  8:27:00  House 2    A  Person 1  3
5  8:30:00  House 3    A  Person 1  3
6  8:37:00  House 4    A  Person 2  4
7  8:40:00  House 1    B  Person 2  5
8  8:48:00  House 1    A  Person 1  5