Question

我有大量的遗传和环境变量数据集，我正在使用线性回归。我需要获得r.squared，adj.r.squared和p值。我实际运行回归部分没有问题，可以得到每个回归的摘要。我有大约20,000个模型需要比较和提取每个值个人似乎乏味。我想必须有一个相对直接的方法来实现这一目标。

以下是我将值提取到data.frame中的代码（b1是我第一个模型的存储摘要）：

df=data.frame(r.squared=numeric(),adj.r.squared=numeric(),fstatvalue=numeric(),fstatnumdf=numeric(),fstatdendf=numeric())
for(i in 1:10)
{
df[iter,]=c(b1$r.squared, b1$adj.r.squared, b1$fstatistic)
}

此代码创建我的data.frame并从同一模型（b1）中提取数据10次。我已经尝试了几种方法来尝试让模型标识符随着每次迭代而改变而没有运气。有没有人有建议？

Answer 1

就像@Roland所说的那样，先将对象放入列表中，然后一切都会变得简单。假设您的工作空间（!!!）中有大约20,000个对象，所有对象都被称为b1，b2，... b20000，您可以将它们添加到列表中，提取摘要统计信息并返回data.frame，如下所示：

# Stick objects in a list
x <- mget( ls( pattern = "^b[0-9]+$" ) )

# Extract summary statistics
out <- lapply( x , function(x) c(x$r.squared, x$adj.r.squared, x$fstatistic) )

# Turn into a data.frame
as.data.frame( out )

将lm摘要中的R ^ 2提取到data.frame中

1 个答案: