将汇总输出添加到原始tibble

时间:2017-09-26 20:03:21

标签: r dplyr tidyverse

我想在变异和总结之间做点什么。

我想计算关于组的摘要统计信息,但是将原始数据保留为嵌套对象。我认为这是一个非常通用的任务,但我无法在不调用连接和分组两次的情况下弄清楚如何做。示例代码如下:

server {
  listen 80;


client_body_buffer_size     100K;
  client_header_buffer_size   1k;
  client_max_body_size        100k;
  large_client_header_buffers 2 1k;

client_body_timeout   10;
  client_header_timeout 10;
  keepalive_timeout     5 5;
  send_timeout          10;

root /var/www/passbolt;

# X-Frame-Options is to prevent from clickJacking attack
  add_header X-Frame-Options SAMEORIGIN;

#  disable content-type sniffing on some browsers.
  add_header X-Content-Type-Options nosniff;

# This header enables the Cross-site scripting (XSS) filter
  add_header X-XSS-Protection "1; mode=block";

# This will enforce HTTP browsing into HTTPS and avoid ssl stripping attack
  add_header Strict-Transport-Security "max-age=31536000; includeSubdomains;";

location / {
    try_files $uri $uri/ /index.php?$args;
    index index.php;
  }

location ~ \.php$ {
    fastcgi_index           index.php;
    fastcgi_pass            127.0.0.1:9000;
    fastcgi_split_path_info ^(.+\.php)(.+)$;
    include                 fastcgi_params;
    fastcgi_param           SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_param           SERVER_NAME $http_host;
  }

location ~* \.(jpe?g|woff|woff2|ttf|gif|png|bmp|ico|css|js|json|pdf|zip|htm|html|docx?|xlsx?|pptx?|txt|wav|swf|svg|avi|mp\d)$ {
    access_log off;
    log_not_found off;
    try_files $uri /app/webroot/$uri /index.php?$args;
  }
}

产生了所需的输出:

mtcars %>% 
  group_by(cyl) %>% 
  nest() %>% 
  left_join(mtcars %>% 
              group_by(cyl) %>% 
              summarise(mean_mpg = mean(mpg)))

但我觉得这不是"正确"这样做的方法。

1 个答案:

答案 0 :(得分:2)

这是一种没有join的方法;从map_dbl包(map系列的一个成员)使用double(基本上是purrr,其中out为tidyverse类型的向量)计算嵌套在mpg列中的data的平均值:

mtcars %>% 
    group_by(cyl) %>% 
    nest() %>% 
    mutate(mean_mpg = map_dbl(data, ~ mean(.x$mpg)))

# A tibble: 3 x 3
#    cyl               data mean_mpg
#  <dbl>             <list>    <dbl>
#1     6  <tibble [7 x 10]> 19.74286
#2     4 <tibble [11 x 10]> 26.66364
#3     8 <tibble [14 x 10]> 15.10000

或者您可以在嵌套之前计算mean_mpg,并将mean_mpg添加为组变量之一:

mtcars %>% 
    group_by(cyl) %>% 
    mutate(mean_mpg = mean(mpg)) %>%
    group_by(mean_mpg, add=TRUE) %>%
    nest()

# A tibble: 3 x 3
#    cyl mean_mpg               data
#  <dbl>    <dbl>             <list>
#1     6 19.74286  <tibble [7 x 10]>
#2     4 26.66364 <tibble [11 x 10]>
#3     8 15.10000 <tibble [14 x 10]>