使用Devel版Dplyr的Scoped过滤器进行条件过滤

时间:2017-05-04 03:43:46

标签: r dplyr tidyverse

library(dplyr) devel version, soon-to-be released 0.6
library(tidyr)

下面是一个简单的数据集。 Q1Sat-Q3Sat变量是满意度水平,Q1Used-Q3Used变量是指调查受访者是否使用了他们评级的项目。这些问题是在调查中一起提出的。实际上,真实数据集包含至少50个Sat变量和Used变量。

Q1Sat<-c("Neutral","Neutral","VSat","Sat","Neutral","Sat","VDis","Sat","Sat","VSat")
Q2Sat<-c("Neutral","VSat","Dis","Dis","VDis","Sat","Sat","VSat","Neutral","Dis")
Q3Sat<-c("Sat","Sat","Diss","Neutral","VSat","VDis","Sat","Sat","Sat","Neutral")
Q3Used<-c("Yes","No","Yes","Yes","Yes","Yes","Yes","Yes","Yes","No")
Q2Used<-c("Yes","Yes","Yes","Yes","No","No","Yes","Yes","Yes","Yes")
Q1Used<-c("Yes","Yes","Yes","No","No","Yes","Yes","Yes","No","Yes")
House<-c("Yes","No","Unsure","Yes","Yes","No","Unsure","Unsure","Yes","Yes")

Test<-data_frame(Q1Sat,Q2Sat,Q3Sat,Q1Used,Q2Used,Q3Used,House)

我想使用下面的代码将数据重组为包含百分比的表。但是,我需要过滤q1Used - q3Used变量只包含“是”,而House变量只包含“是”。如前所述,q1Sat与q1Used绑定,因此只有当q1Used为“是”且House变量为“是”时才应包括q1Sat。我需要为q2Sat和q3Sat执行此操作。

然而,我仍然坚持如何做到这一点。我尝试使用dep版本的dplyr中的作用域过滤器,但我不确定如何将它与多个变量一起使用 - q1Used:q3Used,以及House

那么我如何将House!=“Yes”的过滤器添加到下面代码中的作用域过滤器?

Test%>%
filter_at(vars(Q1Used:Q35Used),all_vars(. != 1)%>%
select(Q1Sat:Q3Sat)%>%
gather()%>%
count(key,value)%>%
mutate(perc=round(n/sum(n),2))%>%
select(-n)%>%
spread(value,perc)

1 个答案:

答案 0 :(得分:0)

没有devel版本的解决方案。一般的想法是我们将不需要的值重新编码为NA而不是过滤。

sat = Test %>% select(Q1Sat:Q3Sat, House) %>%
      gather(key_sat, Sat, -House)
used = Test %>% select(Q1Used:Q3Used) %>%
    gather(key_used, Used)

cbind(used, sat) %>% 
    group_by(key_sat) %>% 
    mutate(
        value = ((Used != "No") & (House == "Yes")) * 1,
        base = sum(value)
    ) %>% 
    group_by(key_sat, Sat) %>% 
    summarise(perc = sum(value)/sum(base[1])) %>% 
    spread(Sat,perc)