在表中连接字符串和计算变量

时间:2018-03-18 18:17:49

标签: r dplyr tidyr

我想确定在最终报告中制作摘要文本的最佳流程。

x <- tribble(
  ~year,       ~service,   ~account,     ~amount,
  "2001",       "Army",     "operations",  5000000,
  "2001",       "Navy",     "operations",  1500000,      
  "2002",       "Army",     "operations",  6000000,
  "2002",       "Navy",     "operations",  1700000,    
  "2001",       "Army",     "repair",       500000,
  "2001",       "Navy",     "repair",       300000,      
  "2002",       "Army",     "repair",       400000,
  "2002",       "Navy",     "repair",       600000)

每项服务的所需文字。

"Between [year.min] and [year.max], the [service] 
spent an average of [average amount]. The largest account
in terms of spending within the [service] was [account], 
which ranked [rank] and fluctuated between [min amount]
and [max amount], with a high of [max amount] in [year] to
a low of [min] in [year]."

所需的输出将在表格中。该过程将在许多子级(帐户,子帐户等)重复进行。

service    summary_text              
  <chr>        <chr>                     
1 Army     concatenated 
2 Navy     concatenated 

最终,我想将结果导出为迷你图旁边的html表,这在Excel中相当简单。

service sparkline   summary_text              
  <chr>   <chr>      <chr>                     
1 Army    sparkline concatenated text 
2 Navy    sparkline concatenated text

2 个答案:

答案 0 :(得分:3)

dplyrglue与不同的分组策略结合使用:

library(dplyr)
library(glue)
output <- x %>% 
  group_by(service,account) %>%
  mutate(amount_sum = sum(amount)) %>%
  group_by(service) %>%
  mutate(average.amount=mean(amount)) %>%
  filter(amount_sum == max(amount_sum)) %>%
  summarize(
    year.min=min(year),
    year.max=max(year),
    average.amount=first(average.amount),
    account=first(account),
    rank=1,
    min.amount =min(amount),
    max.amount=max(amount),
    year.min.amount = year[which.min(amount)],
    year.max.amount = year[which.max(amount)]) %>%
  transmute(service,
            summary_text= glue("Between {year.min} and {year.max}, the {service} 
                               spent an average of {average.amount}. The largest account
                               in terms of spending within the {service} was {account}, 
                               which ranked {rank} and fluctuated between {min.amount}
                               and {max.amount}, with a high of {max.amount} in {year.max.amount} to
                               a low of {min.amount} in {year.min.amount}."))

output %>% pull(summary_text)
# Between 2001 and 2002, the Army 
# spent an average of 2975000. The largest account
# in terms of spending within the Army was operations, 
# which ranked NA and fluctuated between 5e+06
# and 6e+06, with a high of 6e+06 in 2002 to
# a low of 5e+06 in 2001.
# Between 2001 and 2002, the Navy 
# spent an average of 1025000. The largest account
# in terms of spending within the Navy was operations, 
# which ranked NA and fluctuated between 1500000
# and 1700000, with a high of 1700000 in 2002 to
# a low of 1500000 in 2001.

如果要限制外部库依赖项,可以使用pastesprintf代替glue,但这样的示例更具可读性。

在此示例中,我假设rank始终为1。如果您想要处理子帐户,我建议您在summarize调用group_bymutate之前使用与我相同的技巧,这样您就可以按组创建新的列常量。然后在first中拨打summarize

答案 1 :(得分:0)

Moody Mudskipper的答案有点火花。

library(tidyverse)
library(sparkline)
library(formattable)
library(glue)

#Data
x <- tribble(
  ~year,       ~service,   ~account,     ~amount,
  "2001",       "Army",     "operations",  5000000,
  "2001",       "Navy",     "operations",  1500000,      
  "2002",       "Army",     "operations",  6000000,
  "2002",       "Navy",     "operations",  1700000,    
  "2001",       "Army",     "repair",       500000,
  "2001",       "Navy",     "repair",       300000,      
  "2002",       "Army",     "repair",       400000,
  "2002",       "Navy",     "repair",       600000)


# Assemble Text
table <- x %>% 
  group_by(service, year) %>% 
  summarise(total = sum(amount)) %>% 
  group_by(service) %>% 
  summarise(mean_annual_service = mean(total),
            # years range
            first.year = min(year),
            last.year = max(year),
            # min and max years, amounts
            year.min= year[which.min(total)],
            year.max = year[which.max(total)],
            min.amount = total[which.min(total)],
            max.amount = total[which.max(total)]) %>% 
  # Final Text
  mutate(Description = glue('Between {first.year} and {last.year},
                        the average spending in the {service} was 
                        ${prettyNum(mean_annual_service, big.mark = ",")},
                        with a high of ${prettyNum(max.amount, big.mark = ",")} in {year.max}, and a low of
                        ${prettyNum(min.amount, big.mark = ",")} in {year.min}') ) %>% 
  select(service, Description)


# Add Sparkline
x %>% 
    group_by(service, year) %>%
    summarise(total = sum(amount)) %>% 
    summarise(
      Sparkline = spk_chr(
        total, 
        type = "line",
        chartRangeMin=min(total), 
        chartRangeMax=max(total))) %>% 
  left_join(table) %>% 
  formattable() %>% 
  as.htmlwidget() %>% 
  spk_add_deps()

Text and Sparklines