在R

时间:2019-11-24 08:56:28

标签: r

我想运行heckit来校正样品选择偏差的结果。这是我的代码:

ht <- heckit(participation ~ log(friends+2) +log(followers+2) + subjectivity.bnk, delta_polarity ~ subjectivity.bnk , df))

我的IV是subjectivity.bnk,我的DV是delta_polarity。我有friendsfollowers作为我的赫克曼乐器。运行代码时,出现以下错误:

Error in binaryChoice(formula, ..., userLogLik = loglik, weights = weights) : 
  the left hand side of the 'formula' has to contain exactly two levels (e.g. FALSE and TRUE)

Here是我的数据的一个示例。

谢谢

1 个答案:

答案 0 :(得分:1)

具有1的参与项的观察(行)也具有主观性的NA。因此,当您在第一部分中包含subjectivity.bnk时,最终只有0可以参与:

library(sampleSelection)
df = read.delim("ht.csv - ht.csv.tsv",stringsAsFactors=FALSE)
#participation is good
table(df$participation)
 0   1 
117 382 
# see that those with NA for subjectivity.bnk also have 1 for participation
table(df$participation,is.na(df$subjectivity.bnk))

    FALSE TRUE
  0     0  117
  1   382    0

#this works
ht <- heckit(participation ~ log(friends+2) +
log(followers+2), delta_polarity ~ subjectivity.bnk , df)