Question

我正在进行一项调查，参与者将回答第一个问题，是或否，然后回答第二个开放式问题：“如果是，为什么？”

我需要找出回答“是”后回答第二个问题的人所占的百分比。或者，我需要在回答“是”后找到“ NA”的数量。

这是一个看起来相似的数据集：

dates <- c( "2018-07-14", "2018-04-19", "2019-08-15" ,"2018-12-04", 
        "2018-05-02", "2019-04-14")
length<- c(2,4,3,5,1,3)

CANCEL

articleId != ""

因此，例如，我想找出有多少人在#> helpful helpfulhow #> 1 n NA #> 2 y Because this study cannot be put online. Thus I have to create a random wall of text #> 3 n NA #> 4 y This is a confidential study. Thus the data must be changed. #> 5 n NA #> 6 n NA #> 7 y This is a confidential study. Thus the data must be changed every time. #> 8 y NA #> 9 y Qualitative studies are difficult to assess. Here is a random wall of text.下加上'y'，也在> str(b) 'data.frame': 9 obs. of 2 variables: $ helpful : Factor w/ 2 levels "n","y": 1 2 1 2 1 1 2 2 2 $ helpfulhow: Factor w/ 4 levels "Because this study cannot be put online. Thus I have to create a random wall of text.",..: NA 1 NA 4 NA NA 3 NA 2下加上'NA'。预先感谢。

Answer 1

我制作了一个如下的示例数据集；在这里，我将问题1回答为“是”和问题2的行计数为空（使用trimws来消除空格）或NA。然后，除以总行数，我们得到分数。使用软件包percent中的scales，将其转换为百分比。

#>      Name  Q1               Q2
#> 1   Jerry Yes             <NA>
#> 2    Beth  No                 
#> 3 Jessica Yes                 
#> 4   Morty Yes       Aww,Babola
#> 5  Summer  No                 
#> 6    Rick Yes Wubbalubbadubdub


## percentage of people who answered yes to Q1 and also answered Q2
scales::percent(nrow(with(df, 
                          df[Q1=="Yes" & 
                            (trimws(Q2) != "" & !is.na(Q2)),]))/nrow(with(df, 
                                                                          df[Q1=="Yes",])))

#> [1] "50.0%"

数据：

df <- structure(list(Name = structure(c(2L, 1L, 3L, 4L, 6L, 5L), 
                                      .Label = c("Beth", "Jerry", "Jessica", "Morty", "Rick", "Summer"), class = "factor"), 
                     Q1 = structure(c(2L, 1L, 2L, 2L, 1L, 2L), 
                                    .Label = c("No", "Yes"), class = "factor"), 
                     Q2 = structure(c(NA, 1L, 2L, 3L, 1L, 4L), 
                                    .Label = c("", "       ", "Aww,Babola", "Wubbalubbadubdub"), class = "factor")), 
                class = "data.frame", row.names = c(NA, -6L))

对于您的数据集，就像这样：

scales::percent(nrow(with(b, b[helpful=="y" & (trimws(helpfulhow) != "" & !is.na(helpfulhow)),]))/nrow(with(b, b[helpful=="y",])))

#> [1] "100%"

为了使其更加整洁，我们可以使用dplyr软件包：

library(dplyr)
library(scales)

percent(
  b %>% 
    filter(helpful == "y", !is.na(helpfulhow), trimws(helpfulhow) != "") %>% 
    nrow(.) / {b %>% filter(helpful == "y") %>% nrow(.)})

#> [1] "100%"

或

b %>% 
  group_by(helpful) %>% 
  summarise(percent_helpfulhow = percent(sum(trimws(helpfulhow) != "" & !is.na(helpfulhow)) / n())) %>% 
  filter(helpful == "y") %>% 
  pull(2)

#> [1] "100%"

Answer 2

这是使用软件包dplyr和janitor的解决方案：

library(dplyr)
library(janitor)

df %>% 
  mutate(na_flag = ifelse(helpful == 'y' & is.na(helpfulhow), "Y", "N")) %>% 
  tabyl(na_flag) %>% 
  adorn_pct_formatting

哪个给我们：

 na_flag n percent
       N 6  100.0%

如果此示例数据集中（n = 6）对helpfulhow的每个响应均为NA，则显示：

 na_flag n percent
       N 4   66.7%
       Y 2   33.3%

由于有两个受访者对y做出了helpful的回答，但没有对helpfulhow做出回应。

如果您只想查看y个受访者，则可以执行以下操作：

df %>% 
  filter(helpful == "y") %>%
  mutate(na_flag = ifelse(is.na(helpfulhow), "Y", "N")) %>% 
  tabyl(na_flag) %>% 
  adorn_pct_formatting

同时计算满足两个条件的行数

2 个答案:

数据：