我正在尝试计算包含浮点列表的pandas数据框列的平均值和标准偏差。我认为我不需要提取每个列表来进行计算,因此我尝试在数据框内进行操作。 令人惊讶的是,我找不到关于该特定主题的任何内容。
这是一个玩具例子,用来说明我的问题:
l = pd.DataFrame({'D' : [[4,5,6,6,6],[6,8,8,3]], 'R' : [[3,5,6,4,6],[6,9,9,3]]})
l1 = l.apply(pd.to_numeric).mean()
l2 = l.apply(pd.to_numeric).std()
我遇到以下错误:
Traceback (most recent call last):
File "pandas/_libs/lib.pyx", line 1892, in pandas._libs.lib.maybe_convert_numeric
TypeError: Invalid object type
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/pierre/Desktop/Project_inv/pr.py", line 8, in <module>
l1 = l.apply(pd.to_numeric).mean()
File "/Users/pierre/PycharmProjects/untitled22/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 6487, in apply
return op.get_result()
File "/Users/pierre/PycharmProjects/untitled22/venv/lib/python3.7/site-packages/pandas/core/apply.py", line 151, in get_result
return self.apply_standard()
File "/Users/pierre/PycharmProjects/untitled22/venv/lib/python3.7/site-packages/pandas/core/apply.py", line 257, in apply_standard
self.apply_series_generator()
File "/Users/pierre/PycharmProjects/untitled22/venv/lib/python3.7/site-packages/pandas/core/apply.py", line 286, in apply_series_generator
results[i] = self.f(v)
File "/Users/pierre/PycharmProjects/untitled22/venv/lib/python3.7/site-packages/pandas/core/tools/numeric.py", line 135, in to_numeric
coerce_numeric=coerce_numeric)
File "pandas/_libs/lib.pyx", line 1925, in pandas._libs.lib.maybe_convert_numeric
TypeError: ('Invalid object type at position 0', 'occurred at index D')
我不确定出什么问题了,有人会提示如何继续解决此问题吗?
答案 0 :(得分:1)
首先,我认为在good idea中使用list
大熊猫不是这样。
但是确实需要它,是否可以通过DataFrame.applymap
逐元素处理:
l1 = l.applymap(lambda x: np.mean(x))
print (l1)
D R
0 5.40 4.80
1 6.25 6.75
l2 = l.applymap(lambda x: np.std(x))
print (l2)
D R
0 0.800000 1.166190
1 2.046338 2.487469
因此,我建议首先拼合列表,例如由DataFrame.explode
代表0.25+大熊猫,然后进行处理:
df = pd.concat([l['D'].explode(), l['R'].explode()], axis=1).astype(int)
print (df)
D R
0 4 3
0 5 5
0 6 6
0 6 4
0 6 6
1 6 6
1 8 9
1 8 9
l1 = df.mean(level=0)
print (l1)
D R
0 5.40 4.80
1 6.25 6.75
l2 = df.std(level=0)
print (l2)
D R
0 0.894427 1.303840
1 2.362908 2.872281
l21 = df.std(level=0, ddof=0)
print (l21)
D R
0 0.800000 1.166190
1 2.046338 2.487469