数据保存在.txt
中。在同一文本中保存了200个单词。如何将这些原材料输入R并对each of these words
进行二元物流回归?
num 0 0.010752688172
num 0 0.003300330033
thanksgiving 0 0.0123456790123
thanksgiving 0 0.0016339869281
thanksgiving 0 0.00338983050847
off 0 0.00431034482759
off 0 0.00302114803625
off 1 0.001100110011
off 0 0.00377358490566
off 1 0.00166112956811
off 1 0.00281690140845
off 0 0.00564971751412
off 0 0.00112994350282
off 0 0.003300330033
off 0 0.0042735042735
off 1 0.00326797385621
off 0 0.00159489633174
off 0 0.00378787878788
答案 0 :(得分:3)
嗯,我很懒,所以:
allwords <- unique(dataframe[,1])
firstword <- dataframe[dataframe[,1]==allwords[1],]
等。会逐字破坏你的数据。但是,您不需要创建firstword
,secondword
,...因为使用其中一个apply
函数执行回归功能同样容易对于allwords
答案 1 :(得分:1)
以下是我使用plyr
包的方式:
# Load the plyr library
library(plyr)
# Read in the data
allwords <- read.table("words.txt")
# Name the variables more meaningfully than this
names(allwords) <- c("word", "y", "x")
# dlply iterates over the data.frame, splitting by "word",
# and running a glm with the arguments formula = y ~ x and family = binomial
# and returns a list of the resulting glm objects
models <- dlply(allwords,
.var = "word",
.fun = glm, formula = y ~ x, family = binomial)
# It's then easy to iterate over that list using lapply, llply, ldply, etc.
# (depending on what you want back out)
# Summarize:
llply(models, summary)
# Get all the coefficients
ldply(models, coef)
# Get AICs
# Not that you can compare these among word-models, but you get the idea.
ldply(models, AIC)
# Or, if you want to work with a particular model
models$num