连接变量中的多个字符元素并将其重新编码为0和1

时间:2018-12-03 21:11:03

标签: r dplyr

我正在尝试从一长串元素中创建两个新变量,并将它们重新编码为0和1以在逻辑回归中运行。即,调查中的OS1变量指示被调查者在其上完成调查的操作系统。我想将那些移动设备重新编码为“移动”,将那些在PC上重新编码为“ pc”。我尝试使用dplyr :: case_when(),但似乎没有其他变量那样的行为,因此不必对其进行子分类。我的目标是在下面显示的管道操作器中执行此操作。

作为示例,我演示如何过滤“校园A”和激励类型。然后我如何使用dplyr :: case_when()创建三个新的变量列(完成,等级和激励)。

survey <- seru %>% 
select(FINISHED, WC001_INCENTIVE, LEVEL, OS1, CAMPUS_Supplemental) %>%
filter(CAMPUS_Supplemental == "Campus") %>%
filter(WC001_INCENTIVE %in% c("A chance to win one of ten $100 Visa   
gift cards", "A chance to win one of three $500 Visa gift cards",
     "I wanted my opinions to be heard by faculty, staff, and 
the administration")) %>%
mutate(finished = factor(dplyr::case_when(
FINISHED  == "0" ~ 0,
FINISHED == "1" ~ 1
), levels = c(0:1), labels = c("No", "Yes"))) %>%
mutate(grade = factor(dplyr::case_when(
LEVEL == "Freshman" ~ 0,
LEVEL == "Sophomore" ~ 1, 
LEVEL == "Junior" ~ 2,
LEVEL == "Senior" ~ 3
), levels = c(0:3), labels = c("freshman", "sophomore", "junior",     
"senior"))) %>%
mutate(incentive = factor(dplyr::case_when(
WC001_INCENTIVE == "A chance to win one of ten $100 Visa gift cards" ~ 
0, WC001_INCENTIVE == "A chance to win one of three $500 Visa gift  
cards" ~ 1,
WC001_INCENTIVE == "I wanted my opinions to be heard by 
faculty, staff, and the administration" ~ 2
), levels = c(0:2), labels = c("$100 gift card", "$500 gift card", 
"Opinion heard")))

这是数据帧的结构。再次,我将FINISHED,LEVEL和WC001_INCENTIVE突变为新变量(“完成”,“等级”和“激励”)。

 str(survey)
 'data.frame':  4999 obs. of  8 variables:
 $ FINISHED           : int  1 1 1 0 1 1 0 1 1 0 ...

 $ WC001_INCENTIVE    : Factor w/ 6 levels " ","  Strongly agree",..: 4 
   4 4 4 4 3 5 5 4 4 ...
 $ LEVEL              : Factor w/ 5 levels "","Freshman",..: 3 2 5 2 4 
    2 5 2 5 2 ...
 $ OS1                : Factor w/ 44 levels " ","Android 4.1.2",..: 12 
    37 34 31 40 31 12 37 37 31 ...

 $ CAMPUS_Supplemental: Factor w/ 5 levels "","Campus A","Campus B",..: 
   3 3 3 3 3 3 3 3 3 3 ...

 $ finished  : Factor w/ 2 levels "No","Yes": 2 2 2 1 2 2 1 2 2    
   1 ...

 $ grade  : Factor w/ 4 levels "freshman","sophomore",..: 3 1 2 1 4 1 2 
    1 2 1 ...

 $ incentive : Factor w/ 3 levels "$100 gift card",..: 2 2 2 2 
     2 1 3 3 2 2 ...

就像我对“ incentive”和“ grade”所做的那样,我想创建两个新变量“ mobile”和“ pc”作为OS1的子变量,也就是说,将所有移动操作系统合并为一个变量与PC操作系统并存。我看了其他线程,但是它们都指向使用c()函数创建变量。我需要将其作为OS1的子变量,因此要将其包含在上面的管道运算符中。

手机:

 c("iPhone", 'Windows Phone 10.0", "Windows Phone 8.1",   
 "Android 4.1.2", "Android 4.3", "Android 4.4.2", "Android 4.4.4",    
 "Android 5.0", "Android 5.0.1", "Android 5.0.2", "Android 5.1", 
 "Android 5.1.1", "Android 6.0", "Android 6.0.1", "Android 7.0", 
 "Android 7.1.1", "Android 7.1.2")

个人电脑:

"Windows NT 10.0", "Windows NT 5.1", "Windows NT 6.0", "Windows NT 
6.1", "Windows NT 6.2", "Windows NT 6.3", "Macintosh"

最终目标是进行逻辑回归,其中OS1具有两个级别:移动和PC。也就是说,操作系统(使用您的电话或个人计算机)是否影响了受访者是否完成了调查。

谢谢!

2 个答案:

答案 0 :(得分:0)

这将创建两个新列mobilepc,其编码为TRUE / FALSE:

library(tidyverse)

MobileOS <- c("iPhone", "Windows Phone 10.0", "Windows Phone 8.1",
              "Android 4.1.2", "Android 4.3", "Android 4.4.2", "Android 4.4.4",    
              "Android 5.0", "Android 5.0.1", "Android 5.0.2", "Android 5.1", 
              "Android 5.1.1", "Android 6.0", "Android 6.0.1", "Android 7.0", 
              "Android 7.1.1", "Android 7.1.2")

PCOS <- c("Windows NT 10.0", "Windows NT 5.1", "Windows NT 6.0", "Windows NT 6.1", 
          "Windows NT 6.2", "Windows NT 6.3", "Macintosh")

seru %>%
  mutate(mobile = OS1 %in% MobileOS,
         pc = OS1 %in% PCOS)

答案 1 :(得分:0)

我可能只是用功能强大的%in%命令来解决您的问题,就像这样:

mobile <- c("iPhone", 
            "Windows Phone 10.0", "Windows Phone 8.1", 
            "Android 4.1.2", "Android 4.3", "Android 4.4.2", "Android 4.4.4", 
            "Android 5.0", "Android 5.0.1", "Android 5.0.2", "Android 5.1", "Android 5.1.1", 
            "Android 6.0", "Android 6.0.1", 
            "Android 7.0", "Android 7.1.1", "Android 7.1.2")

pc <- c("Windows NT 10.0", "Windows NT 5.1", 
        "Windows NT 6.0", "Windows NT 6.1", "Windows NT 6.2", "Windows NT 6.3", 
        "Macintosh")

os <- c(mobile, pc)

newos <- ifelse(os %in% mobile, "mobile", ifelse(os %in% pc, "pc", NA))

编辑:我的基本上是上面Jordo82所做的base-R版本。