字符串列Pandas数据框的中位数

时间:2019-03-06 18:03:46

标签: python pandas

我需要具有字符串值的pandas数据框列的中值。但我不知道我收到此错误。相反,它有望为我提供最多的重复价值。为什么中值函数试图将期望值转换为浮点数

df_train["Electrical"]
0       SBrkr
1       SBrkr
2       SBrkr
3       SBrkr
4       SBrkr

错误:

df_train["Electrical"].median()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/pandas/core/nanops.py in f(values, axis, skipna, **kwds)
    127                 else:
--> 128                     result = alt(values, axis=axis, skipna=skipna, **kwds)
    129             except Exception:

/opt/conda/lib/python3.6/site-packages/pandas/core/nanops.py in nanmedian(values, axis, skipna)
    379     if not is_float_dtype(values):
--> 380         values = values.astype('f8')
    381         values[mask] = np.nan

ValueError: could not convert string to float: 'SBrkr'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/pandas/core/nanops.py in f(values, axis, skipna, **kwds)
    130                 try:
--> 131                     result = alt(values, axis=axis, skipna=skipna, **kwds)
    132                 except ValueError as e:

/opt/conda/lib/python3.6/site-packages/pandas/core/nanops.py in nanmedian(values, axis, skipna)
    379     if not is_float_dtype(values):
--> 380         values = values.astype('f8')
    381         values[mask] = np.nan

ValueError: could not convert string to float: 'SBrkr'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-89-79051d8f64cf> in <module>()
----> 1 df_train["Electrical"].median()

/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py in stat_func(self, axis, skipna, level, numeric_only, **kwargs)
   9611                                       skipna=skipna)
   9612         return self._reduce(f, name, axis=axis, skipna=skipna,
-> 9613                             numeric_only=numeric_only)
   9614 
   9615     return set_function_name(stat_func, name, cls)

/opt/conda/lib/python3.6/site-packages/pandas/core/series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
   3219                                           'numeric_only.'.format(name))
   3220             with np.errstate(all='ignore'):
-> 3221                 return op(delegate, skipna=skipna, **kwds)
   3222 
   3223         return delegate._reduce(op=op, name=name, axis=axis, skipna=skipna,

/opt/conda/lib/python3.6/site-packages/pandas/core/nanops.py in _f(*args, **kwargs)
     75             try:
     76                 with np.errstate(invalid='ignore'):
---> 77                     return f(*args, **kwargs)
     78             except ValueError as e:
     79                 # we want to transform an object array

/opt/conda/lib/python3.6/site-packages/pandas/core/nanops.py in f(values, axis, skipna, **kwds)
    137 
    138                     if is_object_dtype(values):
--> 139                         raise TypeError(e)
    140                     raise
    141 

TypeError: could not convert string to float: 'SBrkr'

我尝试过google并看到了堆栈溢出问题,但是没有找到可以解决我的问题的有用东西。那么我怎样才能通过熊猫获得中位数呢?谢谢你们考虑我的问题

3 个答案:

答案 0 :(得分:2)

除了回答bart cubrich之外,如果主要目的是查找数据的最大出现量,还可以做到以下目的

将熊猫作为pd导入

df [“ name”]。value_counts()。max()

答案 1 :(得分:1)

中位数公式为{(n + 1)÷2},其中“ n”是集合中的项目数

但是您正在尝试使用非数字字符串

如果您想要最常用的值,请尝试

df_train["Electrical"].value_counts().idxmax

答案 2 :(得分:0)

有很多方法可以做到这一点。您可以在您感兴趣的列上进行分组|

import numpy as np
import pandas as pd

df_train=pd.DataFrame(np.random.random((10, 2)), columns=['x','y'])
df_train['Electrical']=['a','a','a','a','a','a','b','b','b','b']

#method 1: mode
print(df_train['Electrical'].mode())
#method 2: groupby
answer=df_train.groupby(['Electrical']).count()['x'].idxmax()

print(answer)

Out:
'a'