我正在尝试使用r(在rpy2软件包的帮助下)对pandas数据帧中的某些变量进行t检验。我正在jupyter笔记本中使用魔术函数来使python与R交互。交互成功,除了循环。
以下是数据框:
df.head()
Out[60]:
ID Category Num Vert_Horizon Description Fem_Valence_Mean \
0 Animals_001_h Animals 1 h Dead Stork 2.40
1 Animals_002_v Animals 2 v Lion 6.31
2 Animals_003_h Animals 3 h Snake 5.14
3 Animals_004_v Animals 4 v Wolf 4.55
4 Animals_005_h Animals 5 h Bat 5.29
Fem_Valence_SD Fem_Av/Ap_Mean Fem_Av/Ap_SD Arousal_Mean ... \
0 1.30 3.03 1.47 6.72 ...
1 2.19 5.96 2.24 6.69 ...
2 1.19 5.14 1.75 5.34 ...
3 1.87 4.82 2.27 6.84 ...
4 1.56 4.61 1.81 5.50 ...
Luminance Contrast JPEG_size80 LABL LABA LABB Entropy \
0 126.05 68.45 263028 51.75 -0.39 16.93 7.86
1 123.41 32.34 250208 52.39 10.63 30.30 6.71
2 135.28 59.92 190887 55.45 0.25 4.41 7.83
3 122.15 75.10 282350 49.84 3.82 1.36 7.69
4 131.81 59.77 329325 54.26 -0.34 -0.95 7.82
Classification valence_median_split temp_selection
0 Low_Valence OUT
1 High_Valence NaN
2 Low_Valence OUT
3 Low_Valence OUT
4 Low_Valence OUT
[5 rows x 35 columns]
这是我尝试执行的操作:
%Rpush df
Variables = 'All_Valence_Mean', 'Male_Valence_Mean', 'Fem_Valence_Mean'
for var in Variables:
%R var + '_Sig' <- t.test(var ~ valence_median_split, data = df, var.equal = TRUE)
我正在尝试将结果保存到添加了“ Sig”字符串的'var'变量中。这个组件不是至关重要的,但是我真正想要的是让此命令将“ var”识别为变量列表中的变量。
这是我得到的错误:
Error in model.frame.default(formula = var ~ valence_median_split, data = df) :
invalid type (list) for variable 'var'
Error in model.frame.default(formula = var ~ valence_median_split, data = df) :
invalid type (list) for variable 'var'
Error in model.frame.default(formula = var ~ valence_median_split, data = df) :
invalid type (list) for variable 'var'
/anaconda3/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Error in model.frame.default(formula = var ~ valence_median_split, data = df) :
invalid type (list) for variable 'var'
warnings.warn(x, RRuntimeWarning)
答案 0 :(得分:1)
如果您更喜欢R,则将尽可能多的逻辑推向R。例如,这会将结果存储在results
中
您将能够在后续的笔记本单元中从Python访问。
%%R -i df -o results
Variables <- c("All_Valence_Mean", "Male_Valence_Mean",
"Fem_Valence_Mean")
results <- list()
for (var in Variables) {
results[[paste0(var, '_Sig')]] <- t.test(
as.formula(paste(var, '~ valence_median_split')),
data = df, var.equal = TRUE)
}
如果您更熟悉Python,请尽可能多地使用Python:
Variables = ('All_Valence_Mean', 'Male_Valence_Mean',
'Fem_Valence_Mean')
results = dict()
from rpy2.robjects.packages import importr
from rpy2.robjects import Formula
stats = importr('stats')
for var in Variables:
results[('%s_Sig' % var] = stats.t_test(
Formula('%s ~ valence_median_split' % var),
data=df, var_equal=True)