如何在这些数据框架中处理这些数据并进行二元物流回归?

时间:2012-06-15 18:42:54

标签: r

数据保存在.txt中。在同一文本中保存了200个单词。如何将这些原材料输入R并对each of these words进行二元物流回归?

num 0 0.010752688172
num 0 0.003300330033

thanksgiving 0 0.0123456790123
thanksgiving 0 0.0016339869281
thanksgiving 0 0.00338983050847

off 0 0.00431034482759
off 0 0.00302114803625
off 1 0.001100110011
off 0 0.00377358490566
off 1 0.00166112956811
off 1 0.00281690140845
off 0 0.00564971751412
off 0 0.00112994350282
off 0 0.003300330033
off 0 0.0042735042735
off 1 0.00326797385621
off 0 0.00159489633174
off 0 0.00378787878788

2 个答案:

答案 0 :(得分:3)

嗯,我很懒,所以:

allwords <- unique(dataframe[,1])
firstword <- dataframe[dataframe[,1]==allwords[1],]

等。会逐字破坏你的数据。但是,您不需要创建firstwordsecondword,...因为使用其中一个apply函数执行回归功能同样容易对于allwords

的每个值

答案 1 :(得分:1)

以下是我使用plyr包的方式:

# Load the plyr library
library(plyr)

# Read in the data
allwords <- read.table("words.txt")

# Name the variables more meaningfully than this
names(allwords) <- c("word", "y", "x")

# dlply iterates over the data.frame, splitting by "word", 
# and running a glm with the arguments formula = y ~ x and family = binomial
# and returns a list of the resulting glm objects
models <- dlply(allwords,
                .var = "word",
                .fun = glm, formula = y ~ x, family = binomial)

# It's then easy to iterate over that list using lapply, llply, ldply, etc.
# (depending on what you want back out)
# Summarize:
llply(models, summary)

# Get all the coefficients
ldply(models, coef)

# Get AICs
# Not that you can compare these among word-models, but you get the idea.
ldply(models, AIC)

# Or, if you want to work with a particular model
models$num