Question

我想知道解决重复问题的最有效方法。

我有很多电子表格/ csv格式的数据库，有这种形式（0 / T / F变量）：

id_code, age,  heart_disease, weight, gender, operated, survived, ct_scan, days_hospitalized, 
1332,    43.2, 1,             213,    m,      0,        1,        1,       12
22322,   76.4, 0,             125,    f,      1,        0,        0,       45
995,     55,   1,             199,    m,      0,        1,        0,       34

为了对幸存者与非幸存者中的连续变量进行t检验：

myfx1 <- function(x) {t.test((x), mydat$survived)}
myfx1(mydat$age)
myfx1(mydat$weight)

然后我用另一个变量替换'survived'并重复。

为了在幸存者和非幸存者中进行应急交叉，

myfx2 <- function(x) {xtabs(~mydat$survived+x, data=mydat)}
myfx2(mydat$gender)
myfx2(mydat$operated)

我尝试过 plyr 和 doBy ;许多示例总是使用均值/方差或其他简单函数来演示用法。处理大量变量的最简单最有效的方法是什么？

Answer 1

plyr包中有一个可爱的小函数，它逐列运行函数。

colwise(myfx1)(your_db[,you_numeric_columns])

更新：

id_code <- sample(1:1000,500)
age <- sample(40:80,500, replace=T)
heart_disease <- sample(0:1,500,replace=T)
weight <- sample(105:250,500,replace=T)
operated <- sample(0:1,500,replace=T)
survived <- sample(0:1,500,replace=T)
ctscan <- sample(12:45,500,replace=T)

dat <- data.frame(id_code,age,heart_disease,weight,operated,survived,ctscan)

fx1 <- function(x) t.test(x, dat$survived)$p.value

colwise(fx1)(dat[,2:ncol(dat)])

为我工作......作为一个例子。

R基于二进制列变量在多个列上运行多个函数

1 个答案: