我正在进行一项调查,参与者将回答第一个问题,是或否,然后回答第二个开放式问题:“如果是,为什么?”
我需要找出回答“是”后回答第二个问题的人所占的百分比。或者,我需要在回答“是”后找到“ NA”的数量。
这是一个看起来相似的数据集:
dates <- c( "2018-07-14", "2018-04-19", "2019-08-15" ,"2018-12-04",
"2018-05-02", "2019-04-14")
length<- c(2,4,3,5,1,3)
CANCEL
articleId != ""
因此,例如,我想找出有多少人在#> helpful helpfulhow
#> 1 n NA
#> 2 y Because this study cannot be put online. Thus I have to create a random wall of text
#> 3 n NA
#> 4 y This is a confidential study. Thus the data must be changed.
#> 5 n NA
#> 6 n NA
#> 7 y This is a confidential study. Thus the data must be changed every time.
#> 8 y NA
#> 9 y Qualitative studies are difficult to assess. Here is a random wall of text.
下加上'y',也在> str(b)
'data.frame': 9 obs. of 2 variables:
$ helpful : Factor w/ 2 levels "n","y": 1 2 1 2 1 1 2 2 2
$ helpfulhow: Factor w/ 4 levels "Because this study cannot be put online. Thus I have to create a random wall of text.",..: NA 1 NA 4 NA NA 3 NA 2
下加上'NA'。预先感谢。
答案 0 :(得分:2)
我制作了一个如下的示例数据集;在这里,我将问题1回答为“是”和问题2的行计数为空(使用trimws
来消除空格)或NA
。然后,除以总行数,我们得到分数。使用软件包percent
中的scales
,将其转换为百分比。
#> Name Q1 Q2
#> 1 Jerry Yes <NA>
#> 2 Beth No
#> 3 Jessica Yes
#> 4 Morty Yes Aww,Babola
#> 5 Summer No
#> 6 Rick Yes Wubbalubbadubdub
## percentage of people who answered yes to Q1 and also answered Q2
scales::percent(nrow(with(df,
df[Q1=="Yes" &
(trimws(Q2) != "" & !is.na(Q2)),]))/nrow(with(df,
df[Q1=="Yes",])))
#> [1] "50.0%"
df <- structure(list(Name = structure(c(2L, 1L, 3L, 4L, 6L, 5L),
.Label = c("Beth", "Jerry", "Jessica", "Morty", "Rick", "Summer"), class = "factor"),
Q1 = structure(c(2L, 1L, 2L, 2L, 1L, 2L),
.Label = c("No", "Yes"), class = "factor"),
Q2 = structure(c(NA, 1L, 2L, 3L, 1L, 4L),
.Label = c("", " ", "Aww,Babola", "Wubbalubbadubdub"), class = "factor")),
class = "data.frame", row.names = c(NA, -6L))
对于您的数据集,就像这样:
scales::percent(nrow(with(b, b[helpful=="y" & (trimws(helpfulhow) != "" & !is.na(helpfulhow)),]))/nrow(with(b, b[helpful=="y",])))
#> [1] "100%"
为了使其更加整洁,我们可以使用dplyr
软件包:
library(dplyr)
library(scales)
percent(
b %>%
filter(helpful == "y", !is.na(helpfulhow), trimws(helpfulhow) != "") %>%
nrow(.) / {b %>% filter(helpful == "y") %>% nrow(.)})
#> [1] "100%"
或
b %>%
group_by(helpful) %>%
summarise(percent_helpfulhow = percent(sum(trimws(helpfulhow) != "" & !is.na(helpfulhow)) / n())) %>%
filter(helpful == "y") %>%
pull(2)
#> [1] "100%"
答案 1 :(得分:2)
这是使用软件包dplyr
和janitor
的解决方案:
library(dplyr)
library(janitor)
df %>%
mutate(na_flag = ifelse(helpful == 'y' & is.na(helpfulhow), "Y", "N")) %>%
tabyl(na_flag) %>%
adorn_pct_formatting
哪个给我们:
na_flag n percent
N 6 100.0%
如果此示例数据集中(n = 6)对helpfulhow
的每个响应均为NA
,则显示:
na_flag n percent
N 4 66.7%
Y 2 33.3%
由于有两个受访者对y
做出了helpful
的回答,但没有对helpfulhow
做出回应。
如果您只想查看y
个受访者,则可以执行以下操作:
df %>%
filter(helpful == "y") %>%
mutate(na_flag = ifelse(is.na(helpfulhow), "Y", "N")) %>%
tabyl(na_flag) %>%
adorn_pct_formatting