我需要将一个列表折叠成一个数据框/小标题,并将列表名称转换为每个观测值。
#This chunk generates the list
url <- "https://www.ato.gov.au/Rates/Individual-income-tax-for-prior-years/"
pit_sch <- url %>%
read_html() %>%
html_table() %>%
setNames(., url %>%
read_html() %>%
html_nodes("caption") %>%
html_text()) %>%
map(.%>%
mutate(`Tax on this income` = gsub(",", "", `Tax on this income`),
cumm_tax_amt = str_extract(`Tax on this income`, "(?<=^\\$)\\d+") %>% as.numeric(),
tax_rate = str_extract(`Tax on this income`, "\\d+.(\\d+)?(?=(\\s+)?c)") %>% as.numeric(),
threshold = str_extract(`Tax on this income`, "(?<=\\$)\\d+$") %>% as.numeric()
)
) %>%
map(~drop_na(.x, threshold)) %>%
map(function(x) { mutate_each(x, funs(replace(., is.na(.), 0))) })
此代码确实创建了我想要的数据框,但在我需要的每次观察中都未包括列表项的名称。
map_df(pit_sch, `[`, c("Taxable income", "Tax on this income", "cumm_tax_amt", "tax_rate", "threshold"))
输出应包括与数据关联的列表项的名称: “表格名称” ,“应税收入”,“此收入税”,“ cumm_tax_amt”,“ tax_rate”,“阈值”
答案 0 :(得分:1)
我们可以将bind_rows
与.id
一起使用,以创建一个单独的data.frame,其中“ table_name”作为来自names
的{{1}}的新列
list