我有一个非常标准的csv数据集,我试图在IPython Notebook中使用rpy2 / Rmagic读取:
# R code
%load_ext rmagic
%R my.data <- read.csv("/Users/xxx/Documents/data.csv")
我收到此错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-31-844400cf68c6> in <module>()
25 ####Chunk 1: Inputting and checking the data
---> 27 get_ipython().magic(u'R my.data <- read.csv("/Users/xxx/Documents/data.csv")')
28 get_ipython().magic(u'R summary(my.data)')
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in magic(self, arg_s)
2162 magic_name, _, magic_arg_s = arg_s.partition(' ')
2163 magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2164 return self.run_line_magic(magic_name, magic_arg_s)
2165
2166 #-------------------------------------------------------------------------
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_line_magic(self, magic_name, line)
2088 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
2089 with self.builtin_trap:
-> 2090 result = fn(*args,**kwargs)
2091 return result
2092
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/IPython/extensions/rmagic.pyc in R(self, line, cell, local_ns)
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
189 # but it's overkill for just that one bit of state.
190 def magic_deco(arg):
--> 191 call = lambda f, *a, **k: f(*a, **k)
192
193 if callable(arg):
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/IPython/extensions/rmagic.pyc in R(self, line, cell, local_ns)
579 if return_output and not args.noreturn:
580 if result != ri.NULL:
--> 581 return self.Rconverter(result, dataframe=False)
582
583 __doc__ = __doc__.format(
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/IPython/extensions/rmagic.pyc in Rconverter(Robj, dataframe)
113 return np.asarray(Robj)
114 Robj = np.rec.fromarrays(Robj, names = names)
--> 115 return np.asarray(Robj)
116
117 @magics_class
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
233
234 """
--> 235 return array(a, dtype, copy=False, order=order)
236
237 def asanyarray(a, dtype=None, order=None):
TypeError: __float__ returned non-float (type rpy2.rinterface.NAIntegerType)
我猜这与我的csv数据中的NA值有关。我实际上并没有在那里放一个值 - 只是一个空白条目(例如1,3,4)。
我尝试用NA,空格,0等替换空白条目 - 我总是得到相同的错误。我做错了什么?
编辑:我尝试使用纯rpy2(不对我的数据集进行任何更改):
import rpy2.robjects as robjects
myData = robjects.r['read.csv']("/Users/xxx/Documents/data.csv")
print robjects.r['summary'](myData)
它工作正常!所以这必须是IPython / Rmagic。
答案 0 :(得分:3)
错误是因为IPython中的%R
试图将整个csv文件转换为dtype float的单个数组。整数列中的NA值无法转换为float,因此会引发异常。
例如:
>>> import rpy2.robjects as ro
>>> import numpy as np
>>> myData = ro.r['read.csv']('data.csv')
>>> np.asarray(myData)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/numpy/core/numeric.py", line 235, in asarray
return array(a, dtype, copy=False, order=order)
TypeError: __float__ returned non-float (type rpy2.rinterface.NAIntegerType)
一个简单的解决方法是使用--dataframe
中的-d
/ %R
标记。请注意,我们需要使用--noreturn
/ -n
标志,以确保我们不会尝试将返回值转换为数组(这将再次触发错误)。 [或者我们可以在命令的末尾添加分号。]
例如:
In [1]: %load_ext rmagic
In [2]: %R -n -d myData myData <- read.csv('data.csv')
In [3]: myData
Out[3]:
array([(1, 1, 1, 25, 0.590334, 0.4991572, 0.2189781, 9),
(1, 1, 1, 25, 0.5504164, 0.5007439, 0.2136691, 13),
(1, 1, 1, 25, 0.588486, 0.4879058, 0.2105431, 11),
(1, 1, 1, 25, 0.5882244, 0.5148501, 0.2105431, -2147483648),
(1, 2, 1, 25, nan, 0.489045, 0.2025757, 12)],
dtype=[('replicate', '<i4'), ('line', '<i4'), ('genotype', '<i4'), ('temp', '<i4'), ('femur', '<f8'), ('tibia', '<f8'), ('tarsus', '<f8'), ('SCT', '<i4')])
请注意NAInteger
值已转换为-2147483648
(等于numpy.iinfo('<i4').min
)。
答案 1 :(得分:1)
我在回溯中猜测列的类型猜错了(它认为它是一个Python浮点数,而NA是一个整数)。因为它是我无法分辨这是否是ipython或rpy2的问题(你必须单独尝试使用rpy2)。如果具有NA的列确实具有看起来像整数的数值,则添加.0并查看它是否解决了问题。