我正在尝试重新分配或重用pandas df
column
中的值。
对于下面的代码,我希望重新分配[Person]
列。具体来说,如果任何Person
少于3 unique values
,我想将它们合并。如果Person
有3 unique values
,则保持原样。
示例:
Person 1, A
Person 1, B
Person 2, C
Person 1, D
Person 2, E
Person 3, F
使用上面的示例,Person 1
将被保留为3 unique values
,而Person 3
将被Person 2
分组,因为它们分别具有2和1。 / p>
以下是我到目前为止的内容:
d = ({
'Time' : ['8:03:00','8:17:00','8:20:00','10:15:00','10:15:00','11:48:00','12:00:00','12:10:00'],
'Place' : ['House 1','House 2','House 1','House 3','House 4','House 5','House 1','House 1'],
'Area' : ['X','X','Y','X','X','X','X','X'],
'Person' : ['Person 1','Person 1','Person 2','Person 1','Person 3','Person 3','Person 1','Person 1'],
})
df = pd.DataFrame(data=d)
n = 3
df['complete'] = df.Person.apply(lambda x: 1 if df.Person.tolist().count(x) == n else 0)
df['num'] = df.Person.str.replace('Person ','')
df.sort_values(by=['num','complete'],ascending=True,inplace=True)
c = 0
person_numbers = []
for x in range(0,999):
if x % n == 0:
c += 1
person_numbers.append(c)
df['Person_new'] = person_numbers[0:len(df)]
df.Person = 'Person ' + df.Person_new.astype(str)
df.drop(['complete','Person_new','num'],axis=1,inplace=True)
df['Time'] = pd.to_timedelta(df['Time'])
df = df.sort_values(by='Time')
输出:
Time Place Area Person
0 08:03:00 House 1 X Person 1
1 08:17:00 House 2 X Person 1
2 08:20:00 House 1 Y Person 2
3 10:15:00 House 3 X Person 1
4 10:15:00 House 4 X Person 3
5 11:48:00 House 5 X Person 3
6 12:00:00 House 1 X Person 2
7 12:10:00 House 1 X Person 2
这无法识别duplicate
中的rows
Index 6-7
。它们与Index 0
相同。因此,Person 1
应该在这里分配。无法识别代码duplicate
values
。如果我删除了这些duplicated
rows
,则代码可以正常工作,但是我的实际数据集中包含许多duplicates
,可以达到预期目的:
预期输出:
Time Place Area Person
0 8:03:00 House 1 X Person 1
1 8:17:00 House 2 X Person 1
2 8:20:00 House 1 Y Person 2
3 10:15:00 House 3 X Person 1
4 10:15:00 House 4 X Person 2
5 11:48:00 House 5 X Person 2
6 12:00:00 House 1 X Person 1
7 12:10:00 House 1 X Person 1
答案 0 :(得分:0)
尝试对数据进行分组,然后根据需要使用条件进行遍历。
d = ({
'Time' : ['8:03:00','8:17:00','8:20:00','10:15:00','10:15:00','11:48:00','12:00:00','12:10:00'],
'Place' : ['House 1','House 2','House 1','House 3','House 4','House 5','House 1','House 1'],
'Area' : ['X','X','Y','X','X','X','X','X'],
'Person' : ['Person 1','Person 1','Person 2','Person 1','Person 3','Person 3','Person 1','Person 1'],
})
df = pd.DataFrame(data=d)
grouper = df.groupby(['Person','Area','Place'])
new_df = pd.DataFrame()
for index, group in grouper:
# do what you want to group here
if len(group.index) >= 2:
# reassign to person 1?
group['Person'] = 'Person 1'
# append to new_df
new_df = new_df.append(group)