R银行对账单分组

时间:2017-05-03 00:54:50

标签: r inner-join banking

我正在通过按零售商名称对购买进行分组来分析我的银行对账单,然后可以使用00000000000003b0 T init_treetagger 0000000000001c40 T tag_sentence 函数分析生成的数据框。我的方法下面使用自定义函数并且有效,但我很想知道是否有更有效的方法。例如,是否有任何软件包可以使用数据框列之间的复杂匹配逻辑来连接数据框?

T

1 个答案:

答案 0 :(得分:4)


library(tidyverse)
library(glue) 
Statement <- data.frame(
  Purchase = c("abc Aldi xyz","a Kmart bcd","a STARBUCKS ghju","abcd MacD efg"),
  Amount = c(235,23,789,45))

RetailerNames<- c("Aldi","Kmart","Starbucks","MacD")


Statement %>% 
  mutate(
    Retailer = Purchase %>% 
      str_extract(RetailerNames %>% collapse(sep ="|") %>% regex(ignore_case = T))
    )
#>           Purchase Amount  Retailer
#> 1     abc Aldi xyz    235      Aldi
#> 2      a Kmart bcd     23     Kmart
#> 3 a STARBUCKS ghju    789 STARBUCKS
#> 4    abcd MacD efg     45      MacD

如果您想转到left_join路线,请尝试

library(fuzzyjoin)

RetailerNames<- data_frame(Retailer = c("Aldi","Kmart","Starbucks","MacD"))

Statement %>%
  regex_left_join(RetailerNames, by = c(Purchase="Retailer"))