调整原始数据中的小计

时间:2018-06-08 04:02:34

标签: r dplyr tidyr data-munging

对于与messy data合作的国会预算分析师来说,这是一个典型问题。

数据框显示每个项目请求和授权的金额。

授权金额有时多于或少于要求金额。发生这种情况时,调整(此处未包含说明文字)显示在总数下方的括号中。

例如,在下面的数据框中,授权人调整了项目" a" (80 requested) by +19 and +1。经过这些调整后," a"的总授权金额是100。

80 requested + (19 authorized + 1 authorized) = 100 total authorized.

目标:我想根据括号中的数字调整授权金额。

library(tidyverse)

## DATA
df <- tribble(
  ~item, ~requested_amount,  ~authorized_amount,
  "a",           80,               "100",  #< Total
  "a",           NA,               "[19]", #< Adjustment from request
  "a",           NA,               "[1]",  #< Adjustment from request 
  "b",           300,              "300",  #< Total (no adjustment)
  "c",           80,                "70",  #< Total
  "c",           NA,              "[-10]"  #< Adjustment from request
              )

#> # A tibble: 6 x 3
#>   item  requested_amount    authorized_amount
#>   <chr>            <dbl>    <chr>            
#> 1 a                 80      100              
#> 2 a                 NA      [19]             
#> 3 a                 NA      [1]              
#> 4 b                300      300              
#> 5 c                 80       70               
#> 6 c                 NA      [-10]

期望的结果会将括号内的金额视为真正的调整:

项目"a" = (80 + 19 + 1) = 100

的授权金额
#>   item  requested_amount authorized_amount
#>   <chr>            <dbl>             <dbl>
#> 1 a                 80               80 #< Together... 
#> 2 a                 NA               19 #< these add...
#> 3 a                 NA                1 #< to 100 for item "a"
#> 4 b                300              300   
#> 5 c                 80               70 
#> 6 c                 NA             - 10

reprex package(v0.2.0)创建于2018-06-07。

2 个答案:

答案 0 :(得分:1)

我们需要做

library(dplyr)
library(readr)
df %>%
    mutate(authorized_amount = case_when(!is.na(requested_amount) ~ 
                      requested_amount, 
             TRUE ~ parse_number(authorized_amount))

答案 1 :(得分:0)

如果我理解正确,您需要为每个项目总计authorized_amount总和。一个解决方案是:

library(tidyverse)
library(readr)
df %>% 
  mutate(authorized_amount = readr::parse_number(df$authorized_amount)) %>% 
  group_by(item) %>% 
  summarise(requested_amount = requested_amount[!is.na(requested_amount)],
            authorized_amount = sum(authorized_amount))

# A tibble: 3 x 3
  item  requested_amount authorized_amount
  <chr>            <dbl>             <dbl>
1 a                 80.0             120  
2 b                300               300  
3 c                 80.0              60.0