给定数字范围类型为float的情况下如何用分类值替换列中的数字范围

时间:2018-06-21 17:12:09

标签: python numpy dataframe range multiple-conditions

df['ratio_usage'] = np.where(df['ratio_usage'].between(0.9,0.1), 'Excellent', df['ratio_usage'])
df['ratio_usage'] = np.where(df['ratio_usage'].between(0.8,0.89), 'Very Good', df['ratio_usage'])
df['ratio_usage'] = np.where(df['ratio_usage'].between(0.7,0.79), 'Good', df['ratio_usage'])
df['ratio_usage'] = np.where(df['ratio_usage'].between(0.6,0.69), 'Fair', df['ratio_usage'])
df['ratio_usage'] = np.where(df['ratio_usage'].between(0.5,0.59), 'Satisfactory', df['ratio_usage'])
df['ratio_usage'] = np.where(df['ratio_usage'].between(0.4,0.49), 'Poor', df['ratio_usage'])
df['ratio_usage'] = np.where(df['ratio_usage'].between(0.3,0.0), 'Very Poor', df['ratio_usage'])
df['ratio_usage'] = np.where(df['ratio_usage'].between(1.01,2), 'Fatal', df['ratio_usage'])
df['ratio_usage'] = np.where(df['ratio_usage'].between(2.1,1000), 'Outliers', df['ratio_usage'])

它执行并替换第一行代码,但会产生如下错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-269-7ad3204ddca1> in <module>()
      1 df['ratio_usage'] = np.where(df['ratio_usage'].between(0.9,0.1), 'Excellent', df['ratio_usage'])
----> 2 df['ratio_usage'] = np.where(df['ratio_usage'].between(0.8,0.89), 'Very Good', df['ratio_usage'])
      3 df['ratio_usage'] = np.where(df['ratio_usage'].between(0.7,0.79), 'Good', df['ratio_usage'])
      4 df['ratio_usage'] = np.where(df['ratio_usage'].between(0.6,0.69), 'Fair', df['ratio_usage'])
      5 df['ratio_usage'] = np.where(df['ratio_usage'].between(0.5,0.59), 'Satisfactory', df['ratio_usage'])

~\Anaconda\lib\site-packages\pandas\core\series.py in between(self, left, right, inclusive)
   3654         """
   3655         if inclusive:
-> 3656             lmask = self >= left
   3657             rmask = self <= right
   3658         else:

~\Anaconda\lib\site-packages\pandas\core\ops.py in wrapper(self, other, axis)
   1251 
   1252             with np.errstate(all='ignore'):
-> 1253                 res = na_op(values, other)
   1254             if is_scalar(res):
   1255                 raise TypeError('Could not compare {typ} type with Series'

~\Anaconda\lib\site-packages\pandas\core\ops.py in na_op(x, y)
   1138 
   1139         elif is_object_dtype(x.dtype):
-> 1140             result = _comp_method_OBJECT_ARRAY(op, x, y)
   1141 
   1142         elif is_datetimelike_v_numeric(x, y):

~\Anaconda\lib\site-packages\pandas\core\ops.py in _comp_method_OBJECT_ARRAY(op, x, y)
   1117         result = libops.vec_compare(x, y, op)
   1118     else:
-> 1119         result = libops.scalar_compare(x, y, op)
   1120     return result
   1121 

pandas\_libs\ops.pyx in pandas._libs.ops.scalar_compare()

TypeError: '>=' not supported between instances of 'str' and 'float'

1 个答案:

答案 0 :(得分:1)

这是使用 TS2702: 'Car' only refers to a type, but is being used as a namespace here. 的解决方案,它简化了,因为我看不到您的数据,也因为您有需要调整的重叠箱。

设置

pd.cut

df = pd.DataFrame({'ratio_usage': [0.05, 0.8, 0.64, 0.59, 0.31]}) ratio_usage 0 0.05 1 0.80 2 0.64 3 0.59 4 0.31 (带有垃圾箱和标签)

pd.cut