我有一个数据框。结构如下:
OMT object
ZIPCODE object
PRODUCT_CAT int64
SERVICE_CATEGORY object
CURRENT_STANDARD_EDD float64
TOTAL int64
DESTINATION_DISTRIBUTION_CTR object
OPS_EDD float64
OPS_EDD_achieve int64
suggest_edd_1 object
suggest_edd_2 int64
suggest_edd_value_1 int64
suggest_edd_value_2 int64
final_edd_group object
final_edd float64
final_edd_value int64
我要执行以下操作:
当total
<5时,返回D1 / D2 / D3 / D4 / D5 / D6中的标签值,其中第一个大于-1的值与D6相比(如果没有,则为D6)
如果total> = 5,则返回D1 / D2 / D3 / D4 / D5 / D6中的标签值,其中第一个值/ d5与D6相比大于0.95(如果没有,则为D6)
我编写了以下代码,但返回
training_group['suggest_edd_1'] =np.where(training_group['TOTAL']>5,training_group[['D1','D2',
'D3','D4','D5',
'D6']].sub(training_group['D6'],axis =0).ge(-1).assign(D6=True).idxmax(1).str.extract('(\d+)'),
training_group[['D1','D2',
'D3','D4','D5',
'D6']].div(training_group['TOTAL'],axis =0).ge(0.95).assign(D6=True).idxmax(1).str.extract('(\d+)'))
<ipython-input-72-61626eae2be9> in <module>
4 training_group[['D1','D2',
5 'D3','D4','D5',
----> 6 'D6']].div(training_group['TOTAL'],axis =0).ge(OD_pari_target).assign(D6=True).idxmax(1).str.extract('(\d+)'))
MemoryError:
(每个单独的方法都有效,但是如果我在TOTAL
上应用条件,它将不起作用。
我尝试使用适用于每一行的lambda函数,但找不到适合的代码来替换
assign(D6=True)
和extract function
if x['TOTAL'] < piece_threthold:
return x[['D1','D2',
'D3','D4','D5',
'D6']].sub(x['D6'],axis =0).ge(OD_pari_piece).ge(-1).idxmax(1)
else:
return x[['D1','D2',
'D3','D4','D5',
'D6']].div(x['TOTAL'],axis =0).ge(OD_pari_target).ge(-1).idxmax(1)
我可以通过执行以下操作获得所需的结果。但是,我觉得它效率很低,并创建了我不需要的更多列。 (由于我只需要final_Suggest,我稍后将删除proposal_edd_1和proposal_edd_2)
training_group['suggest_edd_1'] = training_group[['D1','D2',
'D3','D4','D5',
'D6']].sub(training_group['D6'],axis =0).ge(OD_pari_piece).assign(D6=True).idxmax(1).str.extract('(\d+)')
training_group['suggest_edd_2'] = training_group[['D1','D2',
'D3','D4','D5',
'D6']].div(training_group['TOTAL'],axis =0).ge(OD_pari_target).assign(D6=True).idxmax(1).str.extract('(\d+)')
training_group['final_suggest'] = np.where(training_group['TOTAL']>5,training_group['suggest_edd_1'] ,training_group['suggest_edd_2'])
答案 0 :(得分:0)
在您身边的每个人都工作得很好时,预先计算了要分配的值
s1=training_group[['D1','D2',
'D3','D4','D5',
'D6']].sub(training_group['D6'],axis =0).ge(-1).assign(D6=True).idxmax(1).str.extract('(\d+)')
s2=training_group[['D1','D2',
'D3','D4','D5',
'D6']].div(training_group['TOTAL'],axis =0).ge(0.95).assign(D6=True).idxmax(1).str.extract('(\d+)')
training_group['suggest_edd_1'] =np.where(training_group['TOTAL']>5,s1,
s2)