无法将数据帧推送到ipython并行引擎

时间:2013-11-13 19:04:08

标签: python pandas ipython dataframe ipython-parallel

我将1MM行和35列的pandas数据帧对象推送到ipython并行引擎的directView中。但是,由于我的函数无法打印数据帧的长度,因此无法将此数据(甚至是空数据帧)推入引擎。这是我的代码片段。

ipcluster start -n 4

def myfn():
  rc = Client()
  dview = rc[:]
  data = ..... #queried from some source of 1MM rows
  dview.push(dict(data=data,new=DataFrame()))
  async = dview.map_async(f,range(3))

  return async 

def f(n):
  test = DataFrame() 
  x = len(data) # type data = pandas.core.frame.DataFrame
  #print len(test) #works fine, gets three "0"s
  #print len(new)  # empty DF, gets an error below
  print len(data)  # 1MM row DF, gets an error below
  return x

在查看asyn.stdout后,这是我收到的错误。任何帮助表示赞赏!:

In [205]: x1.stdout
Out[205]:
[u'Traceback (most recent call last):\n  File "/myProj/ipython/0.13.2-py27/lib/IPython/core/ultratb.py", line 760, in structured_traceback\n    records = _fixed_getinnerframes(etb, context, tb_offset)\n  File "/myProj/ipython/0.13.2-py27/lib/IPython/core/ultratb.py", line 242, in _fixed_getinnerframes\n    records  = fix_frame_records_filenames(inspect.getinnerframes(etb, context))\n  File "//myProj/core/2.7.3-64/lib/python2.7/inspect.py", line 1043, in getinnerframes\n    framelist.append((tb.tb_frame,) + getframeinfo(tb, context))\n  File "//myProj/core/2.7.3-64/lib/python2.7/inspect.py", line 1007, in getframeinfo\n    lines, lnum = findsource(frame)\n  File "//myProj/core/2.7.3-64/lib/python2.7/inspect.py", line 580, in findsource\n    if pat.match(lines[lnum]): break\nIndexError: list index out of range\n',
 u'Traceback (most recent call last):\n  File "/myProj/ipython/0.13.2-py27/lib/IPython/core/ultratb.py", line 760, in structured_traceback\n    records = _fixed_getinnerframes(etb, context, tb_offset)\n  File "/myProj/ipython/0.13.2-py27/lib/IPython/core/ultratb.py", line 242, in _fixed_getinnerframes\n    records  = fix_frame_records_filenames(inspect.getinnerframes(etb, context))\n  File "//myProj/core/2.7.3-64/lib/python2.7/inspect.py", line 1043, in getinnerframes\n    framelist.append((tb.tb_frame,) + getframeinfo(tb, context))\n  File "//myProj/core/2.7.3-64/lib/python2.7/inspect.py", line 1007, in getframeinfo\n    lines, lnum = findsource(frame)\n  File "//myProj/core/2.7.3-64/lib/python2.7/inspect.py", line 580, in findsource\n    if pat.match(lines[lnum]): break\nIndexError: list index out of range\n',
 u'Traceback (most recent call last):\n  File "/myProj/ipython/0.13.2-py27/lib/IPython/core/ultratb.py", line 760, in structured_traceback\n    records = _fixed_getinnerframes(etb, context, tb_offset)\n  File "/myProj/ipython/0.13.2-py27/lib/IPython/core/ultratb.py", line 242, in _fixed_getinnerframes\n    records  = fix_frame_records_filenames(inspect.getinnerframes(etb, context))\n  File "//myProj/core/2.7.3-64/lib/python2.7/inspect.py", line 1043, in getinnerframes\n    framelist.append((tb.tb_frame,) + getframeinfo(tb, context))\n  File "//myProj/core/2.7.3-64/lib/python2.7/inspect.py", line 1007, in getframeinfo\n    lines, lnum = findsource(frame)\n  File "//myProj/core/2.7.3-64/lib/python2.7/inspect.py", line 580, in findsource\n    if pat.match(lines[lnum]): break\nIndexError: list index out of range\n']

1 个答案:

答案 0 :(得分:1)

IPython 0.13中存在一个错误导致DataFrames序列化失败。它在IPython 1.0中修复,因此应该通过升级来解决问题。如果由于某种原因无法升级,那么你必须自己序列化DataFrames,最容易通过在将对象交给IPython之前进行pickling,然后在另一方面进行unpickling。显然,如果可能的话,最好只进行升级。