对于与messy data合作的国会预算分析师来说,这是一个典型问题。
数据框显示每个项目请求和授权的金额。
授权金额有时多于或少于要求金额。发生这种情况时,调整(此处未包含说明文字)显示在总数下方的括号中。
例如,在下面的数据框中,授权人调整了项目" a" (80 requested) by +19 and +1
。经过这些调整后," a"的总授权金额是100。
80 requested + (19 authorized + 1 authorized) = 100 total authorized.
目标:我想根据括号中的数字调整授权金额。
library(tidyverse)
## DATA
df <- tribble(
~item, ~requested_amount, ~authorized_amount,
"a", 80, "100", #< Total
"a", NA, "[19]", #< Adjustment from request
"a", NA, "[1]", #< Adjustment from request
"b", 300, "300", #< Total (no adjustment)
"c", 80, "70", #< Total
"c", NA, "[-10]" #< Adjustment from request
)
#> # A tibble: 6 x 3
#> item requested_amount authorized_amount
#> <chr> <dbl> <chr>
#> 1 a 80 100
#> 2 a NA [19]
#> 3 a NA [1]
#> 4 b 300 300
#> 5 c 80 70
#> 6 c NA [-10]
期望的结果会将括号内的金额视为真正的调整:
项目"a" = (80 + 19 + 1) = 100
#> item requested_amount authorized_amount
#> <chr> <dbl> <dbl>
#> 1 a 80 80 #< Together...
#> 2 a NA 19 #< these add...
#> 3 a NA 1 #< to 100 for item "a"
#> 4 b 300 300
#> 5 c 80 70
#> 6 c NA - 10
由reprex package(v0.2.0)创建于2018-06-07。
答案 0 :(得分:1)
我们需要做
library(dplyr)
library(readr)
df %>%
mutate(authorized_amount = case_when(!is.na(requested_amount) ~
requested_amount,
TRUE ~ parse_number(authorized_amount))
答案 1 :(得分:0)
如果我理解正确,您需要为每个项目总计authorized_amount
总和。一个解决方案是:
library(tidyverse)
library(readr)
df %>%
mutate(authorized_amount = readr::parse_number(df$authorized_amount)) %>%
group_by(item) %>%
summarise(requested_amount = requested_amount[!is.na(requested_amount)],
authorized_amount = sum(authorized_amount))
# A tibble: 3 x 3
item requested_amount authorized_amount
<chr> <dbl> <dbl>
1 a 80.0 120
2 b 300 300
3 c 80.0 60.0