无法使用带有NA值的ggplot进行可视化

时间:2019-02-23 08:53:34

标签: r ggplot2 data-visualization

我正在尝试创建一个堆叠的条形图,以按组显示出现频率的百分比

library(dplyr)
library(ggplot2)

brfss_2013 %>%
  group_by(incomeLev, mentalHealth) %>%
  summarise(count_mentalHealth=n()) %>%
  group_by(incomeLev) %>%
  mutate(count_inc=sum(count_mentalHealth)) %>%
  mutate(percent=count_mentalHealth / count_inc * 100) %>%
  ungroup() %>%
  ggplot(aes(x=forcats::fct_explicit_na(incomeLev),
             y=count_mentalHealth,
             group=mentalHealth)) +
  geom_bar(aes(fill=mentalHealth), 
           stat="identity") +
  geom_text(aes(label=sprintf("%0.1f%%", percent)),
            position=position_stack(vjust=0.5))

但是,这是我收到的回溯信息:

1. dplyr::group_by(., incomeLev, mentalHealth)
8. plyr::summarise(., count_mentalHealth = n())
9. [ base::eval(...) ] with 1 more call
11. dplyr::n()
12. dplyr:::from_context("..group_size")
13. `%||%`(...)
In addition: Warning message:
  Factor `incomeLev` contains implicit NA, consider using `forcats::fct_explicit_na` 
> 

这是我的数据样本

brfss_2013 <- structure(list(incomeLev = structure(c(2L, 3L, 3L, 2L, 2L, 3L, 
NA, 2L, 3L, 1L, 3L, NA), .Label = c("$25,000-$35,000", "$50,000-$75,000", 
"Over $75,000"), class = "factor"), mentalHealth = structure(c(3L, 
1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("Excellent", 
"Ok", "Very Bad"), class = "factor")), row.names = c(NA, -12L
), class = "data.frame")

更新:

str(brfss_2013)的输出:

'data.frame':   491775 obs. of  9 variables:
 $ mentalHealth: Factor w/ 5 levels "Excellent","Good",..: 5 1 1 1 1 1 3 1 1 1 ...
 $ pa1min_     : int  947 110 316 35 429 120 280 30 240 260 ...
 $ bmiLev      : Factor w/ 6 levels "Underweight",..: 5 1 3 2 5 5 2 3 4 3 ...
 $ X_drnkmo4   : int  2 0 80 16 20 0 1 2 4 0 ...
 $ X_frutsum   : num  413 20 46 49 7 157 150 67 100 58 ...
 $ X_vegesum   : num  53 148 191 136 243 143 216 360 172 114 ...
 $ sex         : Factor w/ 2 levels "Male","Female": 2 2 2 2 1 2 2 2 1 2 ...
 $ X_state     : Factor w/ 55 levels "0","Alabama",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ incomeLev   : Factor w/ 4 levels "$25,000-$35,000",..: 2 4 4 2 2 4 NA 2 4 1 ...

2 个答案:

答案 0 :(得分:1)

首先,当您将所有内容转换为字符时,您的代码运行得非常好。所以你可以做

import android.app.Notification;
public class AuthenticationService extends Service {

    @Override
    public void onCreate() {
        super.onCreate();
        startForeground(1,new Notification());
    }
}

然后按照您的想法运行代码。

但是, 我们要考虑因素(在这种情况下,请不要运行brfss_2013[c("incomeLev", "mentalHealth")] <- lapply(brfss_2013[c("incomeLev", "mentalHealth")], as.character) 行!)。

您想要一个lapply(.)类别,可以通过为"missing"添加一个新级别"missing"来获得。

NA

然后,您的聚合(以基本R方式)。

levels(brfss_2013$incomeLev) <- c(levels(brfss_2013$incomeLev), "missing")
brfss_2013$incomeLev[is.na(brfss_2013$incomeLev)] <- "missing"

添加b1 <- with(brfss_2013, aggregate(list(count_mentalHealth=incomeLev), by=list(mentalHealth=mentalHealth, incomeLev=incomeLev), length)) b2 <- aggregate(mentalHealth ~ ., brfss_2013, length) names(b2)[2] <- "count_inc" brfss_2013.agg <- merge(b1, b2) rm(b1, b2) # just to clean up 列。

"percent"

图。

brfss_2013.agg$percent <- with(brfss_2013.agg, count_mentalHealth / count_inc)

结果

enter image description here

答案 1 :(得分:-1)

所以您的代码实际上对我来说很好。看来软件包版本可能存在问题,因为使用plyr summarise函数似乎很奇怪。

不过,这是创建该图形的一种更为简洁的方法(希望这对您要添加到该绘图中的任何内容都很有帮助)

brfss_2013 %>%
  # Add count of income levels first (note this only adds a variable)
  add_count(incomeLev) %>%
  rename(count_inc = n) %>% 
  # Count observations per group (this transforms data)
  count(incomeLev, mentalHealth, count_inc) %>%
  rename(count_mentalHealth = n) %>% 
  mutate(percent= count_mentalHealth / count_inc) %>%
  ggplot(aes(x= incomeLev,
             y= count_mentalHealth,
             # Technically you don't need this group here but groups can be handy
             group= mentalHealth)) + 
  geom_bar(aes(fill=mentalHealth), 
           stat="identity")+ 
  # Using the scales package does the percent formatting for you
  geom_text(aes(label = scales::percent(percent)), vjust = 1)+
  theme_minimal()

Is this hovertext?