Question

我有一个data frame，各种列编码为数字（0,0.5,1）或二进制（0,1）信息，代表各种级别。示例：

df <- data.frame( dialogue = c(1, 0, 1, 1, 0), interlocutor = c(0, 1, 0.5, 0, 0.5))

#   dialogue interlocutor
# 1        1          0.0
# 2        0          1.0
# 3        1          0.5
# 4        1          0.0
# 5        0          0.5

1中的0 / dialogue选项意味着IS或ISNOT是对话。 0中的0.5，1，interlocutor分别对应System interlocutor，System + human interlocutor和Human interlocutor。

问题：

1。我直到现在才知道，我用相应的数字代码编码的这些分类定性数据应该最好用factors表示。是对的吗？或者是否有更好的方法对这些数据进行编码以便于处理（我主要对描述性统计感兴趣）？

2. 如果确实如此，那么如何轻松地将这些数值转换为相应的定性意义呢？

我看到了this question-answer，所以我想到了这样做：

# create corresponding indexes
dialogue_types <- data.frame(index = c(0, 1), value = c("No dialogue", "Dialogue")
interlocutor_types <- data.frame(
  index = c(0, 0.5, 1), 
  value = c("System interlocutor", "System + human interlocutor", "Human interlocutor")

# replace values 
dialogue_types[,2][df$dialogue]
interlocutor_types[,2][df$interlocutor]

这是一个合适的解决方案吗？还有更好的吗？思考这个问题的最佳方式是什么？

Answer 1

因素确实适用于分类变量。

创建因子最明显的方法是factor函数。

df$dialogue_fac = factor(df$dialogue,
                     levels = c(0, 1), labels = c("No dialogue", "Dialogue"))
df$interlocutor_fac = factor(df$interlocutor,
                         levels = c(0, 0.5, 1), labels = c("System", "System + Human", "Human"))

在这里，我添加了新列，以便您可以轻松验证它是否有效，但您也可以使用新列覆盖旧列。

R - 用另一个相应的列表替换值列表（在R中的data.frame或vector中）

1 个答案: