注意:标题可能会产生误导。如果您了解我的问题并想到更具描述性的内容 - 请更改它。
我有一种奇怪的情况,即调查的回答都是字符,而不是数字。看来R,真的不喜欢这个。假设我问了一个问题:
Q. In what area do you work?
East
West
Central
North
South
None of the above
但受访者仅来自东部,西部和中部。
dat <- rep(c("East", "West", "Central"),100)
现在,出于演示目的,重要的是我包括北方,南方和以上都没有,即使它们都不是。然而,将这些元素考虑在内是具有挑战性的。
让我们试试:
fac1 <- factor(dat, labels=c("East","West","Central","North","South","None of the above"))
Error in factor(dat, labels = c("East", "West", "Central", "North", "South", :
invalid labels; length 6 should be 1 or 3
基本上,我想做的是将这些数据与缺失值一起考虑。因此,当我输入类似摘要(fac1)的内容时,它会显示它们在该类别中有0个响应。
必须有一种更简单的方法来做到这一点!
答案 0 :(得分:3)
几乎就在那里。您需要使用levels
参数:
fac1 <- factor(dat, levels=c("East","West","Central","North","South","None of the above"))
str(fac1)
Factor w/ 6 levels "East","West",..: 1 2 3 1 2 3 1 2 3 1 ...
levels
和labels
之间的差异是:
levels
定义数据中的因子级别labels
允许您一次性重命名因子级别。例如:
fac2 <- factor(
dat,
levels=c("East","West","Central","North","South","None of the above"),
labels=c("E", "W", "C", "N", "S", "Other")
)
str(fac2)
Factor w/ 6 levels "E","W","C","N",..: 1 2 3 1 2 3 1 2 3 1 ...
答案 1 :(得分:2)
不是专家,但这有什么帮助吗?
fac1 <- factor(dat, levels =
c("East","West","Central","North","South","None of the above"))
summary(fac1)