Question

我的数据框中的一个问题问人们为什么使用服务。他们指出了适用的原因，然后我将数据加载到R中。它读取了每个人作为字符串给出的一系列答案。我想将它们转换成具有水平的因子。该变量称为“ Why_use_EPNET”，我要寻找的因素是“讨论感兴趣的话题”，“资源收集”，“观看其他人讨论感兴趣的话题”等。

Why_use_EPNET
"Discussing topics of interest, Watching others discuss topics of interest"
"Discussing topics of interest, Resource gathering"
"Resource gathering"
"Resource gathering"
"Watching others discuss topics of interest"

R当前正在阅读每个参与者答案作为字符向量。理想情况下，我想将变量转换为具有不同水平的因子，以便R可以将其识别为：

Why_use_EPNET
1,3
1,2
2
2
3

如果不可能，我将把每个参与者的个人原因分解为二元选择，然后进行分析：

Use_EPNET_for_Resource_gathering
Yes
Yes
Yes
No

Use_EPNET_for_Watching_others_discuss
No
Yes
No
Yes

任何想法都会受到赞赏

Answer 1

假设您当前的数据结构看起来像这样

k <- c(
  "Discussing topics of interest, Watching others discuss topics of interest",
  "Discussing topics of interest, Resource gathering",
  "Resource gathering",
  "Resource gathering",
  "Watching others discuss topics of interest")

Why_use_EPNET <- strsplit(k, ", *")

我们可以使用factor()将答案转换为类别，并提供级别向量。

lv <- c("Discussing topics of interest", 
        "Resource gathering",
        "Watching others discuss topics of interest")

Why_use_EPNET.o <- lapply(Why_use_EPNET, factor, levels=lv)
str(Why_use_EPNET.o)
# List of 5
#  $ : Factor w/ 3 levels "Discussing topics of interest",..: 1 3
#  $ : Factor w/ 3 levels "Discussing topics of interest",..: 1 2
#  $ : Factor w/ 3 levels "Discussing topics of interest",..: 2
#  $ : Factor w/ 3 levels "Discussing topics of interest",..: 2
#  $ : Factor w/ 3 levels "Discussing topics of interest",..: 3

然后您可以使用as.numeric()

获得数值。

lapply(Why_use_EPNET.o, as.numeric)

将字符向量转换为具有水平的因子

1 个答案: