我正在尝试在pandas中使用df.apply()函数,但收到以下错误。如果函数小于“阈值”,则该函数尝试将每个条目转换为0
from pandas import *
import numpy as np
def discardValueLessThan(x, threshold):
if x < threshold : return 0
else: return x
df = DataFrame(np.random.randn(8, 3), columns=['A', 'B', 'C'])
>>> df
A B C
0 -1.389871 1.362458 1.531723
1 -1.200067 -1.114360 -0.020958
2 -0.064653 0.426051 1.856164
3 1.103067 0.194196 0.077709
4 2.675069 -0.848347 0.152521
5 -0.773200 -0.712175 -0.022908
6 -0.796237 0.016256 0.390068
7 -0.413894 0.190118 -0.521194
df.apply(discardValueLessThan, 0.1)
>>> df.apply(discardValueLessThan, 0.1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.8.1-py2.7-macosx-10.5-x86_64.egg/pandas/core/frame.py", line 3576, in apply
return self._apply_standard(f, axis)
File "/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.8.1-py2.7-macosx-10.5-x86_64.egg/pandas/core/frame.py", line 3637, in _apply_standard
e.args = e.args + ('occurred at index %s' % str(k),)
UnboundLocalError: local variable 'k' referenced before assignment
答案 0 :(得分:2)
错误消息对我来说似乎是pandas
错误,但我认为还有其他两个问题。
首先,我认为您必须指定命名参数或使用args
将其他参数传递给apply
。你的第二个参数可能被解释为一个轴。但是如果你使用
df.apply(discardValueLessThan, args=(0.1,))
或
df.apply(discardValueLessThan, threshold=0.1)
然后你会得到
ValueError: ('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', 'occurred at index A')
因为apply
不按元素行事,所以它会对整个Series对象起作用。其他方法包括使用applymap
或布尔索引,即
In [47]: df = DataFrame(np.random.randn(3, 3), columns=['A', 'B', 'C'])
In [48]: df
Out[48]:
A B C
0 -0.135336 -0.274687 1.480949
1 -1.079800 -0.618610 -0.321235
2 -0.610420 -0.422112 0.102703
In [49]: df1 = df.applymap(lambda x: discardValueLessThan(x, 0.1))
In [50]: df1
Out[50]:
A B C
0 0 0 1.480949
1 0 0 0.000000
2 0 0 0.102703
或只是
In [51]: df[df < 0.1] = 0
In [52]: df
Out[52]:
A B C
0 0 0 1.480949
1 0 0 0.000000
2 0 0 0.102703
答案 1 :(得分:0)
你需要这样称呼它:
df.apply(discardValueLessThan, args=(0.1,))
你这样做的方式是0.1并不作为discardValueLessThan的参数传递。