我试图通过Python连接一些R代码,但是将数据转换回pandas对象并不能正确处理NA值。
示例R代码:
dummy_call_method1 <- function(argument) {
col_a <- c("A", "A", "B", "B")
col_b <- c(1, NA, 11, 12)
return(data.frame(col_a, col_b))
}
dummy_call_method2 <- function(argument) {
col_a <- c("A", "A", "B", "B")
col_b <- c("one", NA, "eleven", "twelve")
return(data.frame(col_a, col_b))
}
示例Python代码:
import os
import rpy2
from rpy2 import rinterface, robjects
from rpy2.robjects import pandas2ri
def r_source(base_dir, filename):
r_script = os.path.join(base_dir, filename)
r_src = rpy2.robjects.r['source']
r_src(r_script)
def r_call_function(func_name, *args):
func = rpy2.robjects.r[func_name]
result = func(*args)
return result
r_source('~/workspace/', 'test.R')
dummy_results1 = r_call_function("dummy_call_method1", "")
dummy_results2 = r_call_function("dummy_call_method2", "")
print dummy_results1
print rpy2.robjects.pandas2ri.ri2py(dummy_results)
print dummy_results2
print rpy2.robjects.pandas2ri.ri2py(dummy_results2)
我希望两次调用ri2py分别用None和NaN替换虚拟调用中的NA值。然而,虽然后者正如预期的那样工作,但前者正在用#34; Eleven&#34;出于某种原因 - 我不知道它是否在未初始化的指针中读取或是什么。
这是输出,注意到意外行为:
col_a col_b
1 A 1
2 A NA
3 B 11
4 B 12
col_a col_b
1 A 1.0
2 A NaN
3 B 11.0
4 B 12.0
col_a col_b
1 A one
2 A <NA>
3 B eleven
4 B twelve
col_a col_b
1 A one
2 A eleven #This is incorrect
3 B eleven
4 B twelve