我需要具有字符串值的pandas数据框列的中值。但我不知道我收到此错误。相反,它有望为我提供最多的重复价值。为什么中值函数试图将期望值转换为浮点数
df_train["Electrical"]
0 SBrkr
1 SBrkr
2 SBrkr
3 SBrkr
4 SBrkr
错误:
df_train["Electrical"].median()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/pandas/core/nanops.py in f(values, axis, skipna, **kwds)
127 else:
--> 128 result = alt(values, axis=axis, skipna=skipna, **kwds)
129 except Exception:
/opt/conda/lib/python3.6/site-packages/pandas/core/nanops.py in nanmedian(values, axis, skipna)
379 if not is_float_dtype(values):
--> 380 values = values.astype('f8')
381 values[mask] = np.nan
ValueError: could not convert string to float: 'SBrkr'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/pandas/core/nanops.py in f(values, axis, skipna, **kwds)
130 try:
--> 131 result = alt(values, axis=axis, skipna=skipna, **kwds)
132 except ValueError as e:
/opt/conda/lib/python3.6/site-packages/pandas/core/nanops.py in nanmedian(values, axis, skipna)
379 if not is_float_dtype(values):
--> 380 values = values.astype('f8')
381 values[mask] = np.nan
ValueError: could not convert string to float: 'SBrkr'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-89-79051d8f64cf> in <module>()
----> 1 df_train["Electrical"].median()
/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py in stat_func(self, axis, skipna, level, numeric_only, **kwargs)
9611 skipna=skipna)
9612 return self._reduce(f, name, axis=axis, skipna=skipna,
-> 9613 numeric_only=numeric_only)
9614
9615 return set_function_name(stat_func, name, cls)
/opt/conda/lib/python3.6/site-packages/pandas/core/series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
3219 'numeric_only.'.format(name))
3220 with np.errstate(all='ignore'):
-> 3221 return op(delegate, skipna=skipna, **kwds)
3222
3223 return delegate._reduce(op=op, name=name, axis=axis, skipna=skipna,
/opt/conda/lib/python3.6/site-packages/pandas/core/nanops.py in _f(*args, **kwargs)
75 try:
76 with np.errstate(invalid='ignore'):
---> 77 return f(*args, **kwargs)
78 except ValueError as e:
79 # we want to transform an object array
/opt/conda/lib/python3.6/site-packages/pandas/core/nanops.py in f(values, axis, skipna, **kwds)
137
138 if is_object_dtype(values):
--> 139 raise TypeError(e)
140 raise
141
TypeError: could not convert string to float: 'SBrkr'
我尝试过google并看到了堆栈溢出问题,但是没有找到可以解决我的问题的有用东西。那么我怎样才能通过熊猫获得中位数呢?谢谢你们考虑我的问题
答案 0 :(得分:2)
除了回答bart cubrich之外,如果主要目的是查找数据的最大出现量,还可以做到以下目的
将熊猫作为pd导入
df [“ name”]。value_counts()。max()
答案 1 :(得分:1)
中位数公式为{(n + 1)÷2},其中“ n”是集合中的项目数
但是您正在尝试使用非数字字符串
如果您想要最常用的值,请尝试
df_train["Electrical"].value_counts().idxmax
答案 2 :(得分:0)
有很多方法可以做到这一点。您可以在您感兴趣的列上进行分组|
import numpy as np
import pandas as pd
df_train=pd.DataFrame(np.random.random((10, 2)), columns=['x','y'])
df_train['Electrical']=['a','a','a','a','a','a','b','b','b','b']
#method 1: mode
print(df_train['Electrical'].mode())
#method 2: groupby
answer=df_train.groupby(['Electrical']).count()['x'].idxmax()
print(answer)
Out:
'a'