我正在尝试assign
unique
中的pandas
df
中的值给特定的个人。
对于下面的df
,[Area]
和[Place]
将共同构成unique
值,这些值是各种 job 。这些值将分配给个人,其总体目标是使用尽可能少的个人。
诀窍是这些值会不断地开始和结束,并持续不同的时间长度。一次分配给个人的最多unique
个值是 3 。 [On]
显示[Place]和[Area]当前有多少个唯一值。
因此,这为我需要多少个人提供了具体的指导。例如3个unique
值一个= 1人,六个唯一值on = 2人
我无法执行groupby
语句,其中我assign
的第一个3 unique values
到individual 1
,接下来的三个unique
的值{{1} }等。
我的设想是,当individual 2
的值大于3时,我想先对unique
中的值进行分组,然后将剩余部分合并。因此,请向个人(最多3个)[Area]
中的assign
相同值。然后,如果存在[Area]
个值(<3),则应将它们组合成3的组,在可能的情况下。
我设想的工作方式是:_leftover_
展望未来。对于每个新的hour
值,row
应该看到多少个值script
(这表明需要多少个人)。如果[On]
的值> 3,则它们应为unique
的{{1}}与assigned
中的相同值。如果有剩余个值,则应将它们组合起来以组成3个一组。
对于下面的grouping
,在[Area]
和df
出现的unique
值的数量在1-6之间变化。因此,我们不应有超过2个人[Place]
。当[Area]
的值> 3时,应首先由assigned
分配。 剩余值应与值小于3 unique
的其他个人组合。
对大型df的致歉。这是我可以复制问题的唯一方法!
[Area]
输出:
unique
预期的输出和对我认为应该分配它的原因的评论:
答案 0 :(得分:5)
There's a live version of this answer online that you can try for yourself.
您看到的错误是由于(还有另一个)有趣的问题边缘情况造成的。在6th
作业期间,代码将person 2
分配给(A, House 4)
。然后,它会看到一个区域A
在一个小时内重复出现,因此它将person 2
保留在该区域中。这使得person 2
无法用于下一个作业,该作业位于区域B
中。
但是,由于person 2
区域和位置的独特结合,因此A
内的工作(A, House 1)
并没有理由保留在(A, House 1)
中已经分配给person 1
。
在决定何时将一个人抱在一个区域中时,可以通过仅考虑区域和位置的唯一组合来解决该问题。只需更改几行代码即可。
首先,我们构造一个与唯一(区域,地点)对相对应的区域列表:
unqareas = df[['Area', 'Place']].drop_duplicates()['Area'].values
然后,我们在标识保全的代码的第一行中将unqareas
替换为areas
:
ixrep = np.argmax(np.triu(unqareas.reshape(-1, 1)==unqareas, k=1), axis=1)
import pandas as pd
import numpy as np
from collections import Counter
d = ({
'Time' : ['8:03:00','8:07:00','8:10:00','8:23:00','8:27:00','8:30:00','8:37:00','8:40:00','8:48:00'],
'Place' : ['House 1','House 2','House 3','House 1','House 2','House 3','House 4','House 1','House 1'],
'Area' : ['A','A','A','A','A','A','A','B','A'],
'Person' : ['Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 2','Person 3','Person 1'],
'On' : ['1','2','3','3','3','3','4','5','5']
})
df = pd.DataFrame(data=d)
def getAssignedPeople(df, areasPerPerson):
areas = df['Area'].values
unqareas = df[['Area', 'Place']].drop_duplicates()['Area'].values
places = df['Place'].values
times = pd.to_datetime(df['Time']).values
maxPerson = np.ceil(areas.size / float(areasPerPerson)) - 1
assignmentCount = Counter()
assignedPeople = []
assignedPlaces = {}
heldPeople = {}
heldAreas = {}
holdAvailable = True
person = 0
# search for repeated areas. Mark them if the next repeat occurs within an hour
ixrep = np.argmax(np.triu(unqareas.reshape(-1, 1)==unqareas, k=1), axis=1)
holds = np.zeros(areas.size, dtype=bool)
holds[ixrep.nonzero()] = (times[ixrep[ixrep.nonzero()]] - times[ixrep.nonzero()]) < np.timedelta64(1, 'h')
for area,place,hold in zip(areas, places, holds):
if (area, place) in assignedPlaces:
# this unique (area, place) has already been assigned to someone
assignedPeople.append(assignedPlaces[(area, place)])
continue
if assignmentCount[person] >= areasPerPerson:
# the current person is already assigned to enough areas, move on to the next
a = heldPeople.pop(person, None)
heldAreas.pop(a, None)
person += 1
if area in heldAreas:
# assign to the person held in this area
p = heldAreas.pop(area)
heldPeople.pop(p)
else:
# get the first non-held person. If we need to hold in this area,
# also make sure the person has at least 2 free assignment slots,
# though if it's the last person assign to them anyway
p = person
while p in heldPeople or (hold and holdAvailable and (areasPerPerson - assignmentCount[p] < 2)) and not p==maxPerson:
p += 1
assignmentCount.update([p])
assignedPlaces[(area, place)] = p
assignedPeople.append(p)
if hold:
if p==maxPerson:
# mark that there are no more people available to perform holds
holdAvailable = False
# this area recurrs in an hour, mark that the person should be held here
heldPeople[p] = area
heldAreas[area] = p
return assignedPeople
def allocatePeople(df, areasPerPerson=3):
assignedPeople = getAssignedPeople(df, areasPerPerson=areasPerPerson)
df = df.copy()
df.loc[:,'Person'] = df['Person'].unique()[assignedPeople]
return df
print(allocatePeople(df))
输出:
Time Place Area Person On
0 8:03:00 House 1 A Person 1 1
1 8:07:00 House 2 A Person 1 2
2 8:10:00 House 3 A Person 1 3
3 8:23:00 House 1 A Person 1 3
4 8:27:00 House 2 A Person 1 3
5 8:30:00 House 3 A Person 1 3
6 8:37:00 House 4 A Person 2 4
7 8:40:00 House 1 B Person 2 5
8 8:48:00 House 1 A Person 1 5