在IPython和Jupyter文档中,它表示不推荐使用get_ipython()。magic()。但是当我将代码更改为使用run_line_magic时,它无法推送到R(见下文)。可能与这个问题有关 https://bitbucket.org/rpy2/rpy2/issues/184/valueerror-call-stack-is-not-deep-enough
我在Mac Yosemite上使用Anaconda和Python 2.7。我刚刚更新了Anaconda和rpy2。下面的代码来自Jupyter笔记本。
%load_ext rpy2.ipython
import pandas as pd
'''Two test functions with rpy2.
The only difference between them is that
rpy2fun_magic uses 'magic' to push variable to R and
rpy2fun_linemagic uses 'run_line_magic' to push variable.
'magic' works fine. 'run_line_magic' returns an error.'''
def rpy2fun_magic(df):
get_ipython().magic('R -i df')
get_ipython().run_line_magic('R','df_cor <- cor(df)')
get_ipython().run_line_magic('R','-o df_cor')
return (df_cor)
def rpy2fun_linemagic(df):
get_ipython().run_line_magic('R','-i df')
get_ipython().run_line_magic('R','df_cor <- cor(df)')
get_ipython().run_line_magic('R','-o df_cor')
return (df_cor)
dataframetest = pd.DataFrame([[1,2,3,4],[6,3,4,5],[9,1,7,3]])
df_cor_magic = rpy2fun_magic(dataframetest)
print 'Using magic to push variable works fine\n'
print df_cor_magic
print '\nBut using run_line_magic returns an error\n'
df_cor_linemagic = rpy2fun_linemagic(dataframetest)
Using magic to push variable works fine
[[ 1. -0.37115374 0.91129318 -0.37115374]
[-0.37115374 1. -0.72057669 1. ]
[ 0.91129318 -0.72057669 1. -0.72057669]
[-0.37115374 1. -0.72057669 1. ]]
But using run_line_magic returns an error
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-1-e418b72a8621> in <module>()
28 print '\nBut using run_line_magic returns an error\n'
29
---> 30 df_cor_linemagic = rpy2fun_linemagic(dataframetest)
<ipython-input-1-e418b72a8621> in rpy2fun_linemagic(df)
15
16 def rpy2fun_linemagic(df):
---> 17 get_ipython().run_line_magic('R','-i df')
18 get_ipython().run_line_magic('R','df_cor <- cor(df)')
19 get_ipython().run_line_magic('R','-o df_cor')
/Users/alexmillner/anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_line_magic(self, magic_name, line)
2255 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
2256 with self.builtin_trap:
-> 2257 result = fn(*args,**kwargs)
2258 return result
2259
/Users/alexmillner/anaconda/lib/python2.7/site-packages/rpy2/ipython/rmagic.pyc in R(self, line, cell, local_ns)
/Users/alexmillner/anaconda/lib/python2.7/site-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
191 # but it's overkill for just that one bit of state.
192 def magic_deco(arg):
--> 193 call = lambda f, *a, **k: f(*a, **k)
194
195 if callable(arg):
/Users/alexmillner/anaconda/lib/python2.7/site-packages/rpy2/ipython/rmagic.pyc in R(self, line, cell, local_ns)
657 val = self.shell.user_ns[input]
658 except KeyError:
--> 659 raise NameError("name '%s' is not defined" % input)
660 if args.converter is None:
661 ro.r.assign(input, self.pyconverter(val))
NameError: name 'df' is not defined
答案 0 :(得分:0)
首先与%timeit
讨论同一问题,然后在底部找到解决方法的答案。我正在使用带有Anaconda 2.7.10的IPython 3.1.0,因此我的观察结果可能因版本差异而有所不同。
这不是R扩展名所特有的,您可以使用%timeit
之类的更简单的内容重现这一点:
In [47]: dfrm
Out[47]:
A B C
0 0.690466 0.370793 0.963782
1 0.478427 0.358897 0.689173
2 0.189277 0.268237 0.570624
3 0.735665 0.342549 0.509810
4 0.929736 0.090079 0.384444
5 0.210941 0.347164 0.852408
6 0.241940 0.187266 0.961489
7 0.768143 0.548450 0.604004
8 0.055765 0.842224 0.668782
9 0.717827 0.047011 0.948673
In [48]: def run_timeit(df):
get_ipython().run_line_magic('timeit', 'df.sum()')
....:
In [49]: run_timeit(dfrm)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-49-1e62302232b6> in <module>()
----> 1 run_timeit(dfrm)
<ipython-input-48-0a3e09ec1e0c> in run_timeit(df)
1 def run_timeit(df):
----> 2 get_ipython().run_line_magic('timeit', 'df.sum()')
3
/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_line_magic(self, magic_name, line)
2226 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
2227 with self.builtin_trap:
-> 2228 result = fn(*args,**kwargs)
2229 return result
2230
/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in timeit(self, line, cell)
/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
191 # but it's overkill for just that one bit of state.
192 def magic_deco(arg):
--> 193 call = lambda f, *a, **k: f(*a, **k)
194
195 if callable(arg):
/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in timeit(self, line, cell)
1034 number = 1
1035 for _ in range(1, 10):
-> 1036 time_number = timer.timeit(number)
1037 worst_tuning = max(worst_tuning, time_number / number)
1038 if time_number >= 0.2:
/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in timeit(self, number)
130 gc.disable()
131 try:
--> 132 timing = self.inner(it, self.timer)
133 finally:
134 if gcold:
<magic-timeit> in inner(_it, _timer)
NameError: global name 'df' is not defined
问题在于,魔术线设置为在全局范围内查找变量名,而不是在函数范围内。如果函数rpy2fun_linemagic
的参数恰好与全局变量名一致,则内部代码会选择它,例如:
In [52]: def run_timeit(dfrm):
get_ipython().run_line_magic('timeit', 'dfrm.sum()')
....:
In [53]: run_timeit(dfrm)
The slowest run took 5.67 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 99.1 µs per loop
但这只是偶然的,因为传递给run_line_magic
的内部字符串包含一个全局找到的名称。
但是,即使使用普通magic
函数,我也会收到相同的错误:
In [58]: def run_timeit(df):
get_ipython().magic('timeit df.sum()')
....:
In [59]: run_timeit(dfrm)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-59-1e62302232b6> in <module>()
----> 1 run_timeit(dfrm)
<ipython-input-58-e98c720ea7e8> in run_timeit(df)
1 def run_timeit(df):
----> 2 get_ipython().magic('timeit df.sum()')
3
/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in magic(self, arg_s)
2305 magic_name, _, magic_arg_s = arg_s.partition(' ')
2306 magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2307 return self.run_line_magic(magic_name, magic_arg_s)
2308
2309 #-------------------------------------------------------------------------
/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_line_magic(self, magic_name, line)
2226 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
2227 with self.builtin_trap:
-> 2228 result = fn(*args,**kwargs)
2229 return result
2230
/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in timeit(self, line, cell)
/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
191 # but it's overkill for just that one bit of state.
192 def magic_deco(arg):
--> 193 call = lambda f, *a, **k: f(*a, **k)
194
195 if callable(arg):
/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in timeit(self, line, cell)
1034 number = 1
1035 for _ in range(1, 10):
-> 1036 time_number = timer.timeit(number)
1037 worst_tuning = max(worst_tuning, time_number / number)
1038 if time_number >= 0.2:
/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in timeit(self, number)
130 gc.disable()
131 try:
--> 132 timing = self.inner(it, self.timer)
133 finally:
134 if gcold:
<magic-timeit> in inner(_it, _timer)
NameError: global name 'df' is not defined
解决这个问题的一种(超级糟糕的)方法是使用globals
找到与传递给你的函数的参数相同的项目,然后你将拥有一个全局名称
例如:
In [68]: def run_timeit(df):
for var_name, var_val in globals().iteritems():
if df is var_val:
get_ipython().run_line_magic('timeit', '%s.sum()'%(var_name))
break
....:
In [69]: run_timeit(dfrm)
The slowest run took 5.72 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 99.2 µs per loop
但这非常不稳定,因为它依赖于Python中的pass-by-name。如果我传递一个像整数或字符串这样的对象,我将不得不检查它是否是实体或其他东西,但是否则无法在全局命名空间中“按名称”找到它。
另一种可能稍微好一点的方法是使用IPython存储的user_ns
命名空间dict
。那么至少你没有看全局变量,并且在用户在IPython中分配时已经命名的特定变量有更多的稳定性:
In [71]: def run_timeit(df):
....: g = get_ipython()
....: for var_name, var_val in g.user_ns.iteritems():
....: if df is var_val:
....: g.run_line_magic('timeit', '%s.sum()'%(var_name))
....: break
....:
In [72]: run_timeit(dfrm)
The slowest run took 5.58 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 99 µs per loop
对于您的特定R函数调用,我会尝试:
def rpy2fun_linemagic(df):
g = get_ipython()
for var_name, var_val in g.user_ns.iteritems():
if df is var_val:
g.run_line_magic('R', '-i %s'%(var_name))
g.run_line_magic('R', 'df_cor <- cor(%s)'%(var_name))
g.run_line_magic('R', '-o df_cor')
return df_cor
您可能还必须小心返回语句。如果输出转换回Python的结果是在全局范围内创建变量,而不是函数范围,则可能需要使用return g.user_ns['df_cor']
或其他东西。或者,如果该变量被创建为副作用,您可能根本不想返回任何内容。我不是那种依赖隐含突变的忠实粉丝,但它可能对你有用。
答案 1 :(得分:0)
我怀疑你提供的代码示例仅用于演示run_line_magic()
的问题,但作为参考我正在添加一种方法来做同样的事情而不涉及ipython。
from rpy2.robjects import globalenv
def rpy2cor(df):
fun = globalenv.get('cor', wantfun=True)
df_cor = fun(df)
return df_cor