如何“整理” Quickbooks日记数据以进行R分析

时间:2019-12-07 21:06:04

标签: r dplyr

问题

如果将Quickbooks Journal数据导出为Excel文件,则会遇到分析师的噩梦:汇总数据不包含“汇总”信息。经过一些数据工程设计后,我 知道了怎么做,剩下的就是这个:

date,transaction_type,num,account,debit,credit
12/01/2019,Bill,4296-4301,Accounts Payable,NA,30734.37
NA,NA,NA,Warehouse:NJ Warehouse Rent,10642.79,NA
NA,NA,NA,Warehouse:NJ Warehouse Rent,7476.17,NA
NA,NA,NA,Warehouse:NJ Warehouse Rent,2337.86,NA
NA,NA,NA,Warehouse:NJ Warehouse Rent,3915.85,NA
NA,NA,NA,Warehouse:NJ Warehouse Rent,2878.78,NA
NA,NA,NA,Warehouse:NJ Warehouse Rent,3482.92,NA
12/01/2019,Bill,4953268,Accounts Payable,NA,173.8
NA,NA,NA,Warehouse:Warehouse Expense,173.8,NA
12/01/2019,Bill,198288,Accounts Payable,NA,750
NA,NA,NA,Office Expense:Accounting,750,NA

现在我只剩下数据工程了,我 知道该怎么做:用date,{{ 1}} s和transaction type s应该汇总到?

然后,numdebit将以整洁的方式“聚在一起”。

1 个答案:

答案 0 :(得分:2)

一个选项是fill,然后用pivot_longer重塑为'long'格式

library(dplyr)
library(tidyr)
df1 %>% 
   fill(date, transaction_type, num) %>%
   pivot_longer(cols = debit:credit, 
        names_to = 'type', values_to = 'credit_debit_value')
# A tibble: 22 x 6
#   date       transaction_type num       account                     type   credit_debit_value
#   <chr>      <chr>            <chr>     <chr>                       <chr>               <dbl>
# 1 12/01/2019 Bill             4296-4301 Accounts Payable            debit                 NA 
# 2 12/01/2019 Bill             4296-4301 Accounts Payable            credit             30734.
# 3 12/01/2019 Bill             4296-4301 Warehouse:NJ Warehouse Rent debit              10643.
# 4 12/01/2019 Bill             4296-4301 Warehouse:NJ Warehouse Rent credit                NA 
# 5 12/01/2019 Bill             4296-4301 Warehouse:NJ Warehouse Rent debit               7476.
# 6 12/01/2019 Bill             4296-4301 Warehouse:NJ Warehouse Rent credit                NA 
# 7 12/01/2019 Bill             4296-4301 Warehouse:NJ Warehouse Rent debit               2338.
# 8 12/01/2019 Bill             4296-4301 Warehouse:NJ Warehouse Rent credit                NA 
# 9 12/01/2019 Bill             4296-4301 Warehouse:NJ Warehouse Rent debit               3916.
#10 12/01/2019 Bill             4296-4301 Warehouse:NJ Warehouse Rent credit                NA 
# … with 12 more rows

数据

df1 <- structure(list(date = c("12/01/2019", NA, NA, NA, NA, NA, NA, 
"12/01/2019", NA, "12/01/2019", NA), transaction_type = c("Bill", 
NA, NA, NA, NA, NA, NA, "Bill", NA, "Bill", NA), num = c("4296-4301", 
NA, NA, NA, NA, NA, NA, "4953268", NA, "198288", NA), account = 
 c("Accounts Payable", 
"Warehouse:NJ Warehouse Rent", "Warehouse:NJ Warehouse Rent", 
"Warehouse:NJ Warehouse Rent", "Warehouse:NJ Warehouse Rent", 
"Warehouse:NJ Warehouse Rent", "Warehouse:NJ Warehouse Rent", 
"Accounts Payable", "Warehouse:Warehouse Expense", "Accounts Payable", 
"Office Expense:Accounting"), debit = c(NA, 10642.79, 7476.17, 
2337.86, 3915.85, 2878.78, 3482.92, NA, 173.8, NA, 750), credit = c(30734.37, 
NA, NA, NA, NA, NA, NA, 173.8, NA, 750, NA)),
 class = "data.frame", row.names = c(NA, 
-11L))