我不确定是否可行。我希望能够使用摘要来对除group_by之外所有列中都具有NA的所有行进行计数。通过将所有5个条件放在NO_OL_Percent =
所在的位置,然后将每个列与&
连接起来,可以做到这一点。如果您可以使用SQL进行操作,我应该认为您可以使用dplyr或purrr进行操作,但似乎互联网上没有人尝试过此操作。
必须下载数据here
代码在下面。它可以工作,但是真的没有办法在最后一行代码中使用all函数吗?我需要首先能够进行group_by,并且不能在dplyr中使用filter_all。
farmers_market = read.csv("Export.csv", stringsAsFactors = F, na.strings=c("NA","NaN", ""))
farmers_market %>%
select(c("Website", "Facebook", "Twitter", "Youtube", "OtherMedia", "State")) %>%
group_by(State) %>%
summarise(Num_Markets = n(),
FB_Percent = 100 - 100*sum(is.na(Facebook))/n(),
TW_Percent = 100 - 100*sum(is.na(Twitter))/n(),
#fb=sum(is.na(Facebook)),
OL_Percent = 100 - 100*sum(is.na(Facebook) & is.na(Twitter))/n(),
NO_OL_Percent = 100 - 100*sum(is.na(Facebook) & is.na(Twitter) & is.na(Website) & is.na(Youtube) & is.na(OtherMedia))/n()
)
答案 0 :(得分:1)
由于我们正在总结,因此我删除了select
语句,无论如何将只选择相关的列。从我们要计算cols
的位置创建了一个NA
向量。
我们首先检查每一行,如果该行在NA
列中是否具有所有cols
值,并将TRUE
/ FALSE
的值分配给新列all_NA
。然后,我们group_by
State
并按原样执行其余列的计算,但对于NO_OL_Percent
,我们将ALL_NA
求和以得出每组NA
的总数并将其除以组中的总行数。
library(dplyr)
cols <- c("Website", "Facebook", "Twitter", "Youtube", "OtherMedia")
farmers_market %>%
mutate(all_NA = rowSums(is.na(.[cols])) == length(cols)) %>%
group_by(State) %>%
summarise(Num_Markets = n(),
FB_Percent = 100 - 100*sum(is.na(Facebook))/n(),
TW_Percent = 100 - 100*sum(is.na(Twitter))/n(),
OL_Percent = 100 - 100*sum(is.na(Facebook) & is.na(Twitter))/n(),
NO_OL_Percent = 100 - 100*sum(all_NA)/n())
# State Num_Markets FB_Percent TW_Percent OL_Percent NO_OL_Percent
# <chr> <int> <dbl> <dbl> <dbl> <dbl>
# 1 Alabama 139 25.9 5.76 25.9 37.4
# 2 Alaska 38 42.1 10.5 42.1 65.8
# 3 Arizona 92 57.6 27.2 57.6 80.4
# 4 Arkansas 111 52.3 4.50 52.3 61.3
# 5 California 759 41.5 14.5 43.2 70.1
# 6 Colorado 161 44.1 9.94 44.1 82.6
# 7 Connecticut 157 33.8 12.1 33.8 53.5
# 8 Delaware 36 61.1 11.1 61.1 83.3
# 9 District of Columbia 57 50.9 43.9 50.9 87.7
#10 Florida 262 43.1 8.78 43.1 83.2
# … with 43 more rows
这将提供与当前方法相同的输出,但无需手动编写所有名称。
答案 1 :(得分:0)
获取Percent
列的直接方法是:
farmers_market %>%
select("Website", "Facebook", "Twitter", "Youtube", "OtherMedia", "State") %>%
group_by(State) %>%
summarise_all(funs("Percent" = sum(is.na(.))/n()))
# A tibble: 53 x 6
# State Website_Percent Facebook_Percent Twitter_Percent Youtube_Percent OtherMedia_Percent
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Alabama 0.727 0.741 0.942 0.993 0.964
#2 Alaska 0.447 0.579 0.895 1 0.974
要添加num_markets
列,可以执行以下操作:
farmers_market %>%
select("Website", "Facebook", "Twitter", "Youtube", "OtherMedia", "State") %>%
group_by(State) %>%
mutate(num_markets = n()) %>%
group_by(State, num_markets) %>%
summarise_all(funs("Percent" = sum(is.na(.))/n()))
# A tibble: 53 x 7
# Groups: State [2]
# State num_markets Website_Percent Facebook_Percent Twitter_Percent Youtube_Percent OtherMedia_Percent
# <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Alabama 139 0.727 0.741 0.942 0.993 0.964
#2 Alaska 38 0.447 0.579 0.895 1 0.974