同时计算满足两个条件的行数

时间:2020-07-06 19:41:46

标签: r excel dataframe subset

我正在进行一项调查,参与者将回答第一个问题,是或否,然后回答第二个开放式问题:“如果是,为什么?”

我需要找出回答“是”后回答第二个问题的人所占的百分比。或者,我需要在回答“是”后找到“ NA”的数量。

这是一个看起来相似的数据集:

dates <- c( "2018-07-14", "2018-04-19", "2019-08-15" ,"2018-12-04", 
        "2018-05-02", "2019-04-14")
length<- c(2,4,3,5,1,3)
CANCEL
articleId != ""

因此,例如,我想找出有多少人在#> helpful helpfulhow #> 1 n NA #> 2 y Because this study cannot be put online. Thus I have to create a random wall of text #> 3 n NA #> 4 y This is a confidential study. Thus the data must be changed. #> 5 n NA #> 6 n NA #> 7 y This is a confidential study. Thus the data must be changed every time. #> 8 y NA #> 9 y Qualitative studies are difficult to assess. Here is a random wall of text. 下加上'y',也在> str(b) 'data.frame': 9 obs. of 2 variables: $ helpful : Factor w/ 2 levels "n","y": 1 2 1 2 1 1 2 2 2 $ helpfulhow: Factor w/ 4 levels "Because this study cannot be put online. Thus I have to create a random wall of text.",..: NA 1 NA 4 NA NA 3 NA 2 下加上'NA'。预先感谢。

2 个答案:

答案 0 :(得分:2)

我制作了一个如下的示例数据集;在这里,我将问题1回答为“是”和问题2的行计数为空(使用trimws来消除空格)或NA。然后,除以总行数,我们得到分数。使用软件包percent中的scales,将其转换为百分比。

#>      Name  Q1               Q2
#> 1   Jerry Yes             <NA>
#> 2    Beth  No                 
#> 3 Jessica Yes                 
#> 4   Morty Yes       Aww,Babola
#> 5  Summer  No                 
#> 6    Rick Yes Wubbalubbadubdub


## percentage of people who answered yes to Q1 and also answered Q2
scales::percent(nrow(with(df, 
                          df[Q1=="Yes" & 
                            (trimws(Q2) != "" & !is.na(Q2)),]))/nrow(with(df, 
                                                                          df[Q1=="Yes",])))

#> [1] "50.0%"

数据:

df <- structure(list(Name = structure(c(2L, 1L, 3L, 4L, 6L, 5L), 
                                      .Label = c("Beth", "Jerry", "Jessica", "Morty", "Rick", "Summer"), class = "factor"), 
                     Q1 = structure(c(2L, 1L, 2L, 2L, 1L, 2L), 
                                    .Label = c("No", "Yes"), class = "factor"), 
                     Q2 = structure(c(NA, 1L, 2L, 3L, 1L, 4L), 
                                    .Label = c("", "       ", "Aww,Babola", "Wubbalubbadubdub"), class = "factor")), 
                class = "data.frame", row.names = c(NA, -6L))

对于您的数据集,就像这样:

scales::percent(nrow(with(b, b[helpful=="y" & (trimws(helpfulhow) != "" & !is.na(helpfulhow)),]))/nrow(with(b, b[helpful=="y",])))

#> [1] "100%"

为了使其更加整洁,我们可以使用dplyr软件包:

library(dplyr)
library(scales)

percent(
  b %>% 
    filter(helpful == "y", !is.na(helpfulhow), trimws(helpfulhow) != "") %>% 
    nrow(.) / {b %>% filter(helpful == "y") %>% nrow(.)})

#> [1] "100%"

b %>% 
  group_by(helpful) %>% 
  summarise(percent_helpfulhow = percent(sum(trimws(helpfulhow) != "" & !is.na(helpfulhow)) / n())) %>% 
  filter(helpful == "y") %>% 
  pull(2)

#> [1] "100%"

答案 1 :(得分:2)

这是使用软件包dplyrjanitor的解决方案:

library(dplyr)
library(janitor)

df %>% 
  mutate(na_flag = ifelse(helpful == 'y' & is.na(helpfulhow), "Y", "N")) %>% 
  tabyl(na_flag) %>% 
  adorn_pct_formatting

哪个给我们:

 na_flag n percent
       N 6  100.0%

如果此示例数据集中(n = 6)对helpfulhow的每个响应均为NA,则显示:

 na_flag n percent
       N 4   66.7%
       Y 2   33.3%

由于有两个受访者对y做出了helpful的回答,但没有对helpfulhow做出回应。

如果您只想查看y个受访者,则可以执行以下操作:

df %>% 
  filter(helpful == "y") %>%
  mutate(na_flag = ifelse(is.na(helpfulhow), "Y", "N")) %>% 
  tabyl(na_flag) %>% 
  adorn_pct_formatting