如何使用rpy2在被R识别为空的熊猫中创建空细胞?

时间:2019-03-30 22:43:23

标签: python r pandas numpy dataframe

我正在尝试执行以下操作,但操作正常,但R不能将空单元格识别为空的问题除外。当R抱怨存在两个以上因素时,会出现此错误; R认为标记为“ nan”的单元实际上不是空的。

# Set up the df
d = {'col1': [1, 2, 3, 4, 3, 3, 2, 2], 'col2': [1, 2, 3, 4, 3, 3, 2, 2]}
df = pd.DataFrame(data=d)
df['valence_median_split'] = ''

#Get median of valence
valence_median = df['col1'].median()
df['valence_median_split'] = np.where(df['col2'] < valence_median, 'Low_Valence', 'High_Valence')
df['temp_selection'] = np.nan
low = df.loc[df['valence_median_split'] == 'Low_Valence', 'valence_median_split'].sample(n=2).index
high = df.loc[df['valence_median_split'] == 'High_Valence', 'valence_median_split'].sample(n=2).index
df['temp_selection'] = np.select([df.index.isin(low), df.index.isin(high)], ['Low', 'High'], default= np.nan)

# Push it to R and run a t-test
%Rpush df
%R colnames(df)
%R All_Valence_Mean_Res <- t.test(col2 ~ temp_selection, data = df, var.equal = TRUE)

错误:

Error in t.test.formula(col2 ~ temp_selection, data = df, var.equal = TRUE) : 
  grouping factor must have exactly 2 levels

在python中验证df确实确实具有两个以上的唯一值:

df['temp_selection'].unique()
array(['Low', 'nan', 'High'], dtype=object)

我尝试将df ['valence_median_split']设置为”以及np.nan,并且似乎都在R中产生了此问题。

1 个答案:

答案 0 :(得分:0)

这足够小,您可以查看整个df:

In [821]: df                                                                    
Out[821]: 
   col1  col2 valence_median_split temp_selection
0     1     1          Low_Valence            nan
1     2     2          Low_Valence            nan
2     3     3         High_Valence            nan
3     4     4         High_Valence            nan
4     3     3         High_Valence           High
5     3     3         High_Valence           High
6     2     2          Low_Valence            Low
7     2     2          Low_Valence            Low

在什么意义上,nan值被认为是“空”?