我正在尝试使用numpy select
语句根据两个条件开发一列数据。这些条件在列表中,并且已经过单独测试,以确保它们按预期提取数据。实际应用select语句时出现以下错误。这是抛出的错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-151-6994e3f46efb> in <module>
8 replace = [600, 675, 710, 745, 999]
9
---> 10 train_df3_dummies['credit_C5_score'] = np.select(condition, replace, default = 1)
C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py in select(condlist, choicelist, default)
698 # as the shape is needed for the result. Doing it separately optimizes
699 # for example when all choices are scalars.
--> 700 condlist = np.broadcast_arrays(*condlist)
701 choicelist = np.broadcast_arrays(*choicelist)
702
C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\stride_tricks.py in broadcast_arrays(*args, **kwargs)
257 args = [np.array(_m, copy=False, subok=subok) for _m in args]
258
--> 259 shape = _broadcast_shape(*args)
260
261 if all(array.shape == shape for array in args):
C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\stride_tricks.py in _broadcast_shape(*args)
191 # use the old-iterator because np.nditer does not handle size 0 arrays
192 # consistently
--> 193 b = np.broadcast(*args[:32])
194 # unfortunately, it cannot handle 32 or more arguments directly
195 for pos in range(32, len(args), 31):
ValueError: shape mismatch: objects cannot be broadcast to a single shape
这是正在使用的代码:
condition = [(train_df3_dummies.loc[(train_df3_dummies['credit_model_C5'] == 1) & (train_df3_dummies['credit_number'] == 600)])
,(train_df3_dummies.loc[(train_df3_dummies['credit_model_C5'] == 1) & (train_df3_dummies['credit_number'] == 675)])
,(train_df3_dummies.loc[(train_df3_dummies['credit_model_C5'] == 1) & (train_df3_dummies['credit_number'] == 710)])
,(train_df3_dummies.loc[(train_df3_dummies['credit_model_C5'] == 1) & (train_df3_dummies['credit_number'] == 745)])
,(train_df3_dummies.loc[(train_df3_dummies['credit_model_C5'] == 1) & (train_df3_dummies['credit_number'] == 999)])]
replace = [600, 675, 710, 745, 999]
train_df3_dummies['credit_C5_score'] = np.select(condition, replace, default = 1)
我已经看到此错误适用于此处的几何问题,但不适用于numpy.select
。有什么想法吗?
答案 0 :(得分:1)
最可能是由于以下原因导致的不匹配:
train_df3_dummies.loc[(train_df3_dummies['credit_model_C5'] == 1 ....
也就是说,您的每个条件都有不同的长度。摆脱loc
,即:
condition = [(train_df3_dummies['credit_model_C5'] ==1) & (train_df3_dummies['credit_number']==600),...
]
您也可以这样做:
s = ((train_df3_dummies['credit_model_C5'] == 1) &
train_df3_dummies['credit_number'].isin(replace)
)
train_df3_dummies['credit_C5_score'] = np.where(s,
train_df3_dummies ['credit_number'],1)