在Jupyter / IPython中使用rpy2 run_line_magic错误

时间:2015-12-09 15:38:04

标签: ipython ipython-notebook rpy2 jupyter jupyter-notebook

在IPython和Jupyter文档中,它表示不推荐使用get_ipython()。magic()。但是当我将代码更改为使用run_line_magic时,它无法推送到R(见下文)。可能与这个问题有关 https://bitbucket.org/rpy2/rpy2/issues/184/valueerror-call-stack-is-not-deep-enough

我在Mac Yosemite上使用Anaconda和Python 2.7。我刚刚更新了Anaconda和rpy2。下面的代码来自Jupyter笔记本。

%load_ext rpy2.ipython
import pandas as pd

'''Two test functions with rpy2.
The only difference between them is that 
rpy2fun_magic uses 'magic' to push variable to R and 
rpy2fun_linemagic uses 'run_line_magic' to push variable. 
'magic' works fine. 'run_line_magic' returns an error.'''

def rpy2fun_magic(df):
 get_ipython().magic('R -i df')
 get_ipython().run_line_magic('R','df_cor <- cor(df)')
 get_ipython().run_line_magic('R','-o df_cor')
 return (df_cor)

def rpy2fun_linemagic(df):
 get_ipython().run_line_magic('R','-i df')
 get_ipython().run_line_magic('R','df_cor <- cor(df)')
 get_ipython().run_line_magic('R','-o df_cor')
 return (df_cor)

 dataframetest = pd.DataFrame([[1,2,3,4],[6,3,4,5],[9,1,7,3]])

 df_cor_magic = rpy2fun_magic(dataframetest)
 print 'Using magic to push variable works fine\n'
 print df_cor_magic

 print '\nBut using run_line_magic returns an error\n'

 df_cor_linemagic = rpy2fun_linemagic(dataframetest)

Using magic to push variable works fine

[[ 1.         -0.37115374  0.91129318 -0.37115374]
[-0.37115374  1.         -0.72057669  1.        ]
[ 0.91129318 -0.72057669  1.         -0.72057669]
[-0.37115374  1.         -0.72057669  1.        ]]

But using run_line_magic returns an error

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-e418b72a8621> in <module>()
      28 print '\nBut using run_line_magic returns an error\n'
      29 
 ---> 30 df_cor_linemagic = rpy2fun_linemagic(dataframetest)

 <ipython-input-1-e418b72a8621> in rpy2fun_linemagic(df)
      15 
      16 def rpy2fun_linemagic(df):
 ---> 17     get_ipython().run_line_magic('R','-i df')
      18     get_ipython().run_line_magic('R','df_cor <- cor(df)')
      19     get_ipython().run_line_magic('R','-o df_cor')

 /Users/alexmillner/anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_line_magic(self, magic_name,   line)
       2255                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
       2256             with self.builtin_trap:
    -> 2257                 result = fn(*args,**kwargs)
       2258             return result
       2259 

/Users/alexmillner/anaconda/lib/python2.7/site-packages/rpy2/ipython/rmagic.pyc in R(self, line, cell, local_ns)

/Users/alexmillner/anaconda/lib/python2.7/site-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
       191     # but it's overkill for just that one bit of state.
       192     def magic_deco(arg):
   --> 193         call = lambda f, *a, **k: f(*a, **k)
       194 
       195         if callable(arg):

/Users/alexmillner/anaconda/lib/python2.7/site-packages/rpy2/ipython/rmagic.pyc in R(self, line, cell, local_ns)
       657                         val = self.shell.user_ns[input]
       658                     except KeyError:
   --> 659                         raise NameError("name '%s' is not defined" % input)
       660                 if args.converter is None:
       661                     ro.r.assign(input, self.pyconverter(val))

NameError: name 'df' is not defined

2 个答案:

答案 0 :(得分:0)

首先与%timeit讨论同一问题,然后在底部找到解决方法的答案。我正在使用带有Anaconda 2.7.10的IPython 3.1.0,因此我的观察结果可能因版本差异而有所不同。

这不是R扩展名所特有的,您可以使用%timeit之类的更简单的内容重现这一点:

In [47]: dfrm
Out[47]: 
          A         B         C
0  0.690466  0.370793  0.963782
1  0.478427  0.358897  0.689173
2  0.189277  0.268237  0.570624
3  0.735665  0.342549  0.509810
4  0.929736  0.090079  0.384444
5  0.210941  0.347164  0.852408
6  0.241940  0.187266  0.961489
7  0.768143  0.548450  0.604004
8  0.055765  0.842224  0.668782
9  0.717827  0.047011  0.948673

In [48]: def run_timeit(df):
    get_ipython().run_line_magic('timeit', 'df.sum()')
   ....:     

In [49]: run_timeit(dfrm)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-49-1e62302232b6> in <module>()
----> 1 run_timeit(dfrm)

<ipython-input-48-0a3e09ec1e0c> in run_timeit(df)
      1 def run_timeit(df):
----> 2     get_ipython().run_line_magic('timeit', 'df.sum()')
      3 

/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_line_magic(self, magic_name, line)
   2226                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2227             with self.builtin_trap:
-> 2228                 result = fn(*args,**kwargs)
   2229             return result
   2230 

/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in timeit(self, line, cell)

/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
    191     # but it's overkill for just that one bit of state.
    192     def magic_deco(arg):
--> 193         call = lambda f, *a, **k: f(*a, **k)
    194 
    195         if callable(arg):

/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in timeit(self, line, cell)
   1034             number = 1
   1035             for _ in range(1, 10):
-> 1036                 time_number = timer.timeit(number)
   1037                 worst_tuning = max(worst_tuning, time_number / number)
   1038                 if time_number >= 0.2:

/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in timeit(self, number)
    130         gc.disable()
    131         try:
--> 132             timing = self.inner(it, self.timer)
    133         finally:
    134             if gcold:

<magic-timeit> in inner(_it, _timer)

NameError: global name 'df' is not defined

问题在于,魔术线设置为在全局范围内查找变量名,而不是在函数范围内。如果函数rpy2fun_linemagic的参数恰好与全局变量名一致,则内部代码会选择它,例如:

In [52]: def run_timeit(dfrm):
    get_ipython().run_line_magic('timeit', 'dfrm.sum()')
   ....:     

In [53]: run_timeit(dfrm)
The slowest run took 5.67 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 99.1 µs per loop

但这只是偶然的,因为传递给run_line_magic的内部字符串包含一个全局找到的名称。

但是,即使使用普通magic函数,我也会收到相同的错误:

In [58]: def run_timeit(df):
    get_ipython().magic('timeit df.sum()')
   ....:     

In [59]: run_timeit(dfrm)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-59-1e62302232b6> in <module>()
----> 1 run_timeit(dfrm)

<ipython-input-58-e98c720ea7e8> in run_timeit(df)
      1 def run_timeit(df):
----> 2     get_ipython().magic('timeit df.sum()')
      3 

/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in magic(self, arg_s)
   2305         magic_name, _, magic_arg_s = arg_s.partition(' ')
   2306         magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2307         return self.run_line_magic(magic_name, magic_arg_s)
   2308 
   2309     #-------------------------------------------------------------------------

/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_line_magic(self, magic_name, line)
   2226                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2227             with self.builtin_trap:
-> 2228                 result = fn(*args,**kwargs)
   2229             return result
   2230 

/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in timeit(self, line, cell)

/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
    191     # but it's overkill for just that one bit of state.
    192     def magic_deco(arg):
--> 193         call = lambda f, *a, **k: f(*a, **k)
    194 
    195         if callable(arg):

/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in timeit(self, line, cell)
   1034             number = 1
   1035             for _ in range(1, 10):
-> 1036                 time_number = timer.timeit(number)
   1037                 worst_tuning = max(worst_tuning, time_number / number)
   1038                 if time_number >= 0.2:

/home/ely/anaconda/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in timeit(self, number)
    130         gc.disable()
    131         try:
--> 132             timing = self.inner(it, self.timer)
    133         finally:
    134             if gcold:

<magic-timeit> in inner(_it, _timer)

NameError: global name 'df' is not defined

解决这个问题的一种(超级糟糕的)方法是使用globals找到与传递给你的函数的参数相同的项目,然后你将拥有一个全局名称

例如:

In [68]: def run_timeit(df):
    for var_name, var_val in globals().iteritems():
        if df is var_val:
            get_ipython().run_line_magic('timeit', '%s.sum()'%(var_name))
            break
   ....:         

In [69]: run_timeit(dfrm)
The slowest run took 5.72 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 99.2 µs per loop

但这非常不稳定,因为它依赖于Python中的pass-by-name。如果我传递一个像整数或字符串这样的对象,我将不得不检查它是否是实体或其他东西,但是否则无法在全局命名空间中“按名称”找到它。

另一种可能稍微好一点的方法是使用IPython存储的user_ns命名空间dict。那么至少你没有看全局变量,并且在用户在IPython中分配时已经命名的特定变量有更多的稳定性:

In [71]: def run_timeit(df):
   ....:     g = get_ipython()
   ....:     for var_name, var_val in g.user_ns.iteritems():
   ....:         if df is var_val:
   ....:             g.run_line_magic('timeit', '%s.sum()'%(var_name))
   ....:             break
   ....:         

In [72]: run_timeit(dfrm)
The slowest run took 5.58 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 99 µs per loop

对于您的特定R函数调用,我会尝试:

def rpy2fun_linemagic(df):
    g = get_ipython()
    for var_name, var_val in g.user_ns.iteritems():
        if df is var_val:
            g.run_line_magic('R', '-i %s'%(var_name))
            g.run_line_magic('R', 'df_cor <- cor(%s)'%(var_name))
            g.run_line_magic('R', '-o df_cor')
            return df_cor

您可能还必须小心返回语句。如果输出转换回Python的结果是在全局范围内创建变量,而不是函数范围,则可能需要使用return g.user_ns['df_cor']或其他东西。或者,如果该变量被创建为副作用,您可能根本不想返回任何内容。我不是那种依赖隐含突变的忠实粉丝,但它可能对你有用。

答案 1 :(得分:0)

我怀疑你提供的代码示例仅用于演示run_line_magic()的问题,但作为参考我正在添加一种方法来做同样的事情而不涉及ipython。

from rpy2.robjects import globalenv
def rpy2cor(df):
    fun = globalenv.get('cor', wantfun=True)
    df_cor = fun(df)
    return df_cor