我正在尝试创建效率更高的脚本,该脚本根据另一列中的值创建新的column
。下面的脚本执行了此操作,但是我一次只能选择一个string
。我想对所有单个值执行此操作。
对于下面的df
,我目前正在string
中的每个单独的Location
上运行脚本。但是,我想在所有unique
strings
上运行脚本。
有关如何分配新列的说明:string
中的每个Location
都会获得Day
中前3个唯一项的值。因此,对于Location
中的每个值,都会为Day
中的前三个唯一值分配一个新的字符串。
import pandas as pd
import numpy as np
d = ({
'Day' : ['Mon','Tues','Wed','Wed','Thurs','Thurs','Fri','Mon','Sat','Fri','Sun'],
'Location' : ['Home','Home','Away','Home','Away','Home','Home','Home','Home','Away','Home'],
})
df = pd.DataFrame(data=d)
#Select value
mask = df['Location'] == 'Home'
df1 = df[mask].drop_duplicates('Day')
d = dict(zip(df1['Day'], np.arange(len(df1)) // 3 + 1))
df.loc[mask, 'Assign'] = df.loc[mask, 'Day'].map(d)
此刻,我正在选择['Location']
中的每个值,例如mask = df['Location'] == 'Home'
。
我想对所有值进行处理。例如mask = df['Location'] == All unique values
预期输出:
Day Location Assign
0 Mon Home C1
1 Tues Home C1
2 Wed Away C2
3 Wed Home C1
4 Thurs Away C2
5 Thurs Home C3
6 Fri Home C3
7 Mon Home C1
8 Sat Home C3
9 Fri Away C2
10 Sun Home C4
答案 0 :(得分:1)
# DataFrame Given
df = pd.DataFrame({
'Day' : ['Mon','Tues','Mon','Wed','Thurs','Fri','Mon','Sat','Sun','Tues'],
'Location' : ['Home','Home','Away','Home','Home','Home','Home','Home','Home','Away'],
})
Unique_group = ['Mon','Tues','Wed']
df['Group'] = df['Day'].apply(lambda x:1 if x in Unique_group else 2)
df['Assign'] = np.zeros(len(df))
# Assigning the ditionary values for output from numeric
vals = dict([(i,'C'+str(i)) for i in range(len(df))])
循环剪切每一行的数据框,并检查先前的“分配”列信息以分配新值
for i in range(1,len(df)+1,1):
# Slicing the Dataframe line by line
df1 = df[:i]
# Incorporating the conditions of Group and Location
df1 = df1[(df1.Location == df1.Location.loc[i-1]) & (df1.Group == df1.Group.loc[i-1]) ]
# Writing the 'Assign' value for the first line of sliced df
if len(df1)==1:
df.loc[i-1,'Assign'] = df[:i].Assign.max()+1
# Writing the 'Assign value based on previous values if it has contiuos 2 values of same group
elif (df1.Assign.value_counts()[df1.Assign.max()] <3):
df.loc[i-1,'Assign'] = df1.Assign.max()
# Writing 'Assign' value for new group
else:
df.loc[i-1,'Assign'] = df[:i]['Assign'].max()+1
df.Assign = df.Assign.map(vals)
出局:
Day Location Group Assign
0 Mon Home 1 C1
1 Tues Home 1 C1
2 Mon Away 1 C2
3 Wed Home 1 C1
4 Thurs Home 2 C3
5 Fri Home 2 C3
6 Mon Home 1 C4
7 Sat Home 2 C3
8 Sun Home 2 C5
9 Tues Away 1 C2
答案 1 :(得分:1)
第二次尝试成功。
很难理解这个问题。
我确定应该用熊猫来做 如果您检查,则groupby()和数据框合并 此回复的历史,您可以了解我 更改了答案以替换更慢的Python 带有快速熊猫代码的代码。
下面的代码首先计算每个 位置,然后使用辅助数据框 创建最终值。
我建议将此代码粘贴到Jupyter笔记本中 并研究中介步骤。
import pandas as pd
import numpy as np
d = ({
'Day' : ['Mon','Tues','Wed','Wed','Thurs','Thurs','Fri','Mon','Sat','Fri','Sun'],
'Location' : ['Home','Home','Away','Home','Away','Home','Home','Home','Home','Away','Home'],
})
df = pd.DataFrame(data=d)
# including the example result
df["example"] = pd.Series(["C" + str(e) for e in [1, 1, 2, 1, 2, 3, 3, 1, 3, 2, 4]])
# this groups days per location
s_grouped = df.groupby(["Location"])["Day"].unique()
# This is the 3 unique indicator per location
df["Pre-Assign"] = df.apply(
lambda x: 1 + list(s_grouped[x["Location"]]).index(x["Day"]) // 3, axis=1
)
# Now we want these unique per combination
df_pre = df[["Location", "Pre-Assign"]].drop_duplicates().reset_index().drop("index", 1)
df_pre["Assign"] = 'C' + (df_pre.index + 1).astype(str)
# result
df.merge(df_pre, on=["Location", "Pre-Assign"], how="left")
结果
其他数据框/系列:
答案 2 :(得分:1)
您可以使用:
def f(x):
#get unique days
u = x['Day'].unique()
#mapping dictionary
d = dict(zip(u, np.arange(len(u)) // 3 + 1))
x['new'] = x['Day'].map(d)
return x
df = df.groupby('Location', sort=False).apply(f)
#add Location column
s = df['new'].astype(str) + df['Location']
#encoding by factorize
df['new'] = pd.Series(pd.factorize(s)[0] + 1).map(str).radd('C')
print (df)
Day Location new
0 Mon Home C1
1 Tues Home C1
2 Wed Away C2
3 Wed Home C1
4 Thurs Away C2
5 Thurs Home C3
6 Fri Home C3
7 Mon Home C1
8 Sat Home C3
9 Fri Away C2
10 Sun Home C4
答案 3 :(得分:0)
不那么漂亮,但是比groupby / apply方法要快得多...
def get_ordered_unique(a):
u, idx = np.unique(a, return_index=True)
# get ordered unique values
return a[np.sort(idx)]
# split ordered unique value array into arrays of size 3
def find_ugrps(a):
ord_u = get_ordered_unique(a)
if ord_u.size > 3:
split_idxs = [i for i in range(1, ord_u.size) if i % 3 == 0]
u_grps = np.split(ord_u, split_idxs)
else:
u_grps = [ord_u]
return u_grps
locs = pd.factorize(df.Location)[0] + 1
days = pd.factorize(df.Day)[0] + 1
assign = np.zeros(days.size).astype(int)
unique_locs = get_ordered_unique(locs)
i = 0
for loc in unique_locs:
i += 1
loc_idxs = np.where(locs == loc)[0]
# find the ordered unique day values for each loc val slice
these_unique_days = get_ordered_unique(days[loc_idxs])
# split into ordered groups of three
these_3day_grps = find_ugrps(these_unique_days)
# assign integer for days found within each group
for ugrp in these_3day_grps:
day_idxs = np.where(np.isin(days[loc_idxs], ugrp))[0]
np.put(assign, loc_idxs[day_idxs], i)
i += 1
# set proper ordering within assign array using factorize
df['Assign'] = (pd.factorize(assign)[0] + 1)
df['Assign'] = 'C' + df['Assign'].astype(str)
print(df)
Day Location Assign
0 Mon Home C1
1 Tues Home C1
2 Wed Away C2
3 Wed Home C1
4 Thurs Away C2
5 Thurs Home C3
6 Fri Home C3
7 Mon Home C1
8 Sat Home C3
9 Fri Away C2
10 Sun Home C4