我正在尝试根据另外2个值在column
中分配新的pandas df
。
在下面的df中,对于Location
(Home, Away etc)
中的每个单独的值,我想为其中的第一个integer
对应的3
值分配一个递增的unique
Day
。
import pandas as pd
import numpy as np
d = ({
'Time' : ['7:00:00','8:00:00','9:00:00','11:00:00','12:00:00','1:00:00','2:00:00','3:00:00'],
'Day' : ['Mon','Tues','Wed','Thurs','Fri','Thurs','Fri','Sat'],
'Location' : ['Home','Home','Home','Away','Away','Home','Home','Home'],
})
df = pd.DataFrame(data=d)
#Assign values from Home
mask = df['Location'] == 'Home'
df1 = df[mask].drop_duplicates('Day')
d = dict(zip(df1['Day'], np.arange(len(df1)) // 3 + 1))
df.loc[mask, 'Assign'] = df.loc[mask, 'Day'].map(d)
#Assign values from Away
mask = df['Location'] == 'Away'
df1 = df[mask].drop_duplicates('Day')
d = dict(zip(df1['Day'], np.arange(len(df1)) // 3 + 1))
df.loc[mask, 'Assign'] = df.loc[mask, 'Day'].map(d)
出局:
Time Day Location Assign
0 7:00:00 Mon Home 1.0
1 8:00:00 Tues Home 1.0
2 9:00:00 Wed Home 1.0
3 11:00:00 Thurs Away 1.0
4 12:00:00 Fri Away 1.0
5 1:00:00 Thurs Home 2.0
6 2:00:00 Fri Home 2.0
7 3:00:00 Sat Home 2.0
预期输出:
Time Day Location Assign
0 7:00:00 Mon Home 1.0
1 8:00:00 Tues Home 1.0
2 9:00:00 Wed Home 1.0
3 11:00:00 Thurs Away 2.0
4 12:00:00 Fri Away 2.0
5 1:00:00 Thurs Home 3.0
6 2:00:00 Fri Home 3.0
7 3:00:00 Sat Home 3.0
答案 0 :(得分:0)
我认为需要使用GroupBy.apply
的自定义函数,然后通过factorize
将值转换为数字值:
def f(x):
x1 = x.drop_duplicates('Day')
d = dict(zip(x1['Day'], np.arange(len(x1)) // 3 + 1))
x['new'] = x['Day'].map(d)
return x
df = df.groupby('Location', sort=False, group_keys=False).apply(f)
df['new'] = pd.factorize(df['new'].astype(str) + df['Location'])[0] + 1
print (df)
Time Day Location new
0 7:00:00 Mon Home 1
1 8:00:00 Tues Home 1
2 9:00:00 Wed Home 1
3 11:00:00 Thurs Away 2
4 12:00:00 Fri Away 2
5 1:00:00 Thurs Home 3
6 2:00:00 Fri Home 3
7 3:00:00 Sat Home 3
使用unique
代替drop_duplicates
的另一种类似解决方案:
def f(x):
u = x['Day'].unique()
d = dict(zip(u, np.arange(len(u)) // 3 + 1))
x['new'] = x['Day'].map(d)
return x
df = df.groupby('Location', sort=False).apply(f)
s = df['new'].astype(str) + df['Location']
df['new'] = pd.factorize(s)[0] + 1
print (df)
Day Location new
0 Mon Home 1
1 Tues Home 1
2 Wed Away 2
3 Wed Home 1
4 Thurs Away 2
5 Thurs Home 3
6 Fri Home 3
7 Mon Home 1
8 Sat Home 3
9 Fri Away 2
10 Sun Home 4