用GAM挖掘子集

时间:2014-02-08 18:09:04

标签: subset

我正在尝试使用我的数据集构建一组GAM模型,但是有一些模型必须遵循的特定规则。因此,我试图使用带有子集的pdredge来限制为结果生成的模型。这是我的代码:

# Dredge for GAMs
# using Ground_10 as dependent variable, R_Data_v9
# using only observations with no inundation (i.e., xarea=0)

# To run on Rachel's laptop:
# source("C:/Users/Davidson/Documents/Dana Anderson/Japan ignition/R scripts/gam_dredge_1.R")
# or to run on cluster
# source("gam_dredge_1.R")

# LOAD LIBRARIES

library(mgcv)
library(MuMIn)
library(MASS)
library(parallel)

# READ IN DATA

# To read it from Dana's computer:
data<-read.table("C:/.../R_Data_v9_ground_10.txt", header=TRUE)
#note: deleted actual location of data for security reasons.



# Select only those observations with no inundation
data_gm<-subset(data,xarea==0)
print("data loaded")
# Assign variable names
ground_10<-data_gm$ground_10
xpsa03<-data_gm$xpsa03
xpgv<-data_gm$xpgv
xpga<-data_gm$xpga
xii<-data_gm$xii
xpsa10<-data_gm$xpsa10
xpsa30<-data_gm$xpsa30
xres<-data_gm$xres
xcom<-data_gm$xcom
xindus<-data_gm$xindus
xratio<-data_gm$xratio
xdam1<-data_gm$xdam1
xdam2<-data_gm$xdam2
xdam3<-data_gm$xdam3
xpdam1<-data_gm$xpdam1
xpdam2<-data_gm$xpdam2
xpdam3<-data_gm$xpdam3
xdam123<-data_gm$xdam123
xpdam123<-data_gm$xpdam123
xpop<-data_gm$xpop
xestab<-data_gm$xestab
xwood<-data_gm$xwood

# SET UP CLUSTER

# Detect number of cores on computer
detectCores()

# Determine cluster type (mine is a PSOCK)
clusterType <- if(length(find.package("snow", quiet = TRUE))) "SOCK" else "PSOCK"

# Set up a cluster with number of cores specified as result of detectCores() 
#   and call it "clust" 
# For laptop with 4 cores
clust <- makeCluster(getOption("cl.cores", 4), type = clusterType)


# Load required packages onto worker nodes
#   (in this example, load packages {MASS} and {MuMIn} to be used by pdredg)
clusterEvalQ(clust,library(mgcv))
clusterEvalQ(clust,library(MuMIn))

#GAM RUNS:

gam1<-gam(ground_10~s(xii,k=2)+s(xpga,k=2)+s(xpgv,k=2)+s(xpsa03,k=2)+s(xpsa10,k=2)+s(xpsa30,k=2)+
s(xpop,k=2)+s(xres,k=2)+s(xestab,k=2)+s(xcom,k=2)+
s(xindus,k=2)+
s(xpdam1,k=2)+s(xpdam2,k=2)+s(xpdam3,k=2)+s(xpdam123,k=2)+
s(xwood,k=2)+
s(xdam1,k=2)+s(xdam2,k=2)+s(xdam123,k=2))


# Export data and any objects the modeling function will use 
#    into the cluster worker nodes
clusterExport(clust,c("data_gm","gam1","ground_10", "xii", "xpga", "xpgv", "xpsa03", "xpsa10", "xpsa30", "xpop", "xres", "xestab", "xcom", "xindus","xpdam1", "xpdam2", "xpdam3", "xpdam123", "xwood", "xdam1", "xdam2", "xdam3", "xdam123"))

# Run pdredge using subsetting so as to allow no more than 1 ground motion covariate at a time

pdd.gam1<-pdredge(gam1, cluster=clust,      
        subset=(!('s(xii, k=2)'&'s(xpga, k=2)') & !'(s(xii, k=2)' &'s(xpgv, k=2)') & !('s(xii,k=2)'&'s(xpsa03,k=2)') & !('s(xii, k=2)'&'s(xpsa10,k=2)') & !('s(xii, k=2)'&'s(xpsa30,k=2)') & !('s(xpga, k=2)'&'s(xpgv, k=2)')       
              & !('s(xpga, k=2)'&'s(xpsa03,k=2)') & !('s(xpga,k=2)' &'s(xpsa10,k=2)') & !('s(xpga,k=2)'&'s(xpsa30,k=2)') & !('s(xpgv,k=2)'&'s(xpsa03,k=2)') & !('s(xpgv,k=2)'&'s(xpsa10,k=2)') & !('s(xpgv,k=2)'&'s(xpsa30,k=2)')       
              & !('s(xpsa03,k=2)'&'s(xpsa10,k=2)') & !('s(xpsa03,k=2)'&'s(xpsa30,k=2)') & !('s(xpsa10,k=2)'&'s(xpsa30,k=2)')        
              & !('s(xpop,k=2)'&'s(xres,k=2)') & !('s(xpop,k=2)'&'s(xestab,k=2)') & !('s(xpop,k=2)'&'s(xcom,k=2)') & !('s(xres,k=2)'&'s(xestab,k=2)') & !('s(xres,k=2)'&'s(xcom,k=2)') & !('s(xestab,k=2)'&'s(xcom,k=2)')       
              & !('s(xpdam1,k=2)'&'s(xpdam123,k=2)' & !('s(xpdam2,k=2)'&'s(xpdam123,k=2)') & !('s(xpdam3,k=2)'&'s(xpdam123,k=2)')       
            & !('s(xdam1,k=2)'&'s(xdam2,k=2)') & !('s(xdam1,k=2)'&'s(xdam3,k=2)') & !('s(xdam1,k=2)'&'s(xdam123,k=2)') & !('s(xdam2,k=2)'&'s(xdam3,k=2)') & !('s(xdam2,k=2)'& 's(xdam123,k=2)') & !('s(xdam3,k=2)'&'s(xdam123,k=2)')),rank=function(x) summary(x)$sp.criterion, extra=c(GCV=function(x) summary(x)$sp.criterion, "AIC"))

但是,每次运行时,都会出现以下错误:

  

pdredge中的错误(gam1,cluster = clust,subset =(!(s(xii,k=2)&amp; s(xpga,k=2))&amp ;;     'subset'表达式中无法识别的名称:“s(xii,k = 2)”,“s(xpga,k = 2)”,“s(xpgv,k = 2)”,“s(xpsa03,k = 2) “,”s(xpsa10,k = 2)“,”s(xpsa30,k = 2)“,”s(xpop,k = 2)“,”s(xres,k = 2)“,”s(xestab ,k = 2)“,”s(xcom,k = 2)“,”s(xpdam1,k = 2)“,”s(xpdam123,k = 2)“,”s(xpdam2,k = 2)“ ,“s(xpdam3,k = 2)”,“s(xdam1,k = 2)”,“s(xdam2,k = 2)”,“s(xdam3,k = 2)”和“s(xdam123, K = 2)“

请帮我解决这个问题!我已经尝试过我能想到的一切! 谢谢,达娜

2 个答案:

答案 0 :(得分:1)

我遇到了类似的问题。我假设您正在关注dredge()中的子集化帮助:即“复合模型术语(例如I()中的'as-is'表达式或gam中的smooths)应被视为非语法名称并包含在后面 - 滴答,例如

subset = ‘s(x, k = 2)‘ || ‘I(log(x))‘

我发现我只能使用`(back-ticks!)而不是帮助示例中显示的字符来工作,即'(在我看来是单引号)。

另外,dredge()似乎要求你完全匹配它在内部使用的间距

即。 就我而言,我的模型是

M1<-gam(PLNK ~ s(CX, k=5) + s(CHL, k=5) + s(HC, k=5) + s(sqHUM, k=5) 
+ s(sqHDIST, k=5) + s(SSTL, k=5) + s(WV, k=5) + AT, data=z, 
family=Gamma (link=log))

我想要排除包含WV和SSTL的模型

subset=!(`s(CHL, k = 5)` & `s(WV, k=5)`) 

没有用,但是

subset=!(`s(CHL, k = 5)` & `s(WV, k = 5)`)

确实

我通过运行不带子集参数的dredge,然后在结果数据框中查看模型,收集了dredge使用的内部格式。

答案 1 :(得分:1)

您可以在全局模型上使用'getAllTerms'功能以正确的形式列出所有术语。