我有这个数据框:
ID <- c(1,1,2,3,3,3,4,5,6,6)
linguistic_fluency <- c("good", "very good", "bad", "bad", "very bad", "very good", "good", "very good", "normal", "very bad")
survey_year <- c(2007, 2008, 2009, 2009, 2008, 2007, 2007, 2008, 2007, 2008)
data <- data.frame(ID, linguistic_fluency, survey_year)
我想检查调查的参与者多年来是否始终如一地报告其语言流利程度。因此,我想要一个下表,其中列在t-1中,行在t中。
非常感谢您的帮助。 谢谢。
答案 0 :(得分:1)
您可以将变量滞后,然后制作一个频率表。 例如:
# Re-order the factor levels first
data$linguistic_fluency <- factor(data$linguistic_fluency,
levels = c("very bad","bad","normal","good","very good"))
library(Hmisc) # load library containing Lag() function
# apply function to each student
data$Lag_fluency <- unlist(tapply(data$linguistic_fluency, data$ID,function(x) Lag(x,1)))
# resulting in the following data frame. Some respondents only have one observation,
# the Lag() function returns NA for these respondents
> data
ID linguistic_fluency survey_year Lag_fluency
1 1 good 2007 <NA>
2 1 very good 2008 good
3 2 bad 2009 <NA>
4 3 bad 2009 <NA>
5 3 very bad 2008 bad
6 3 very good 2007 very bad
7 4 good 2007 <NA>
8 5 very good 2008 <NA>
9 6 normal 2007 <NA>
10 6 very bad 2008 normal
然后您需要的是原始变量和滞后变量之间的频率表:
> table(data$Lag_fluency, data$linguistic_fluency)
very bad bad normal good very good
very bad 0 0 0 0 1
bad 1 0 0 0 0
normal 1 0 0 0 0
good 0 0 0 0 1
very good 0 0 0 0 0