如果将Quickbooks Journal数据导出为Excel文件,则会遇到分析师的噩梦:汇总数据不包含“汇总”信息。经过一些数据工程设计后,我 知道了怎么做,剩下的就是这个:
date,transaction_type,num,account,debit,credit
12/01/2019,Bill,4296-4301,Accounts Payable,NA,30734.37
NA,NA,NA,Warehouse:NJ Warehouse Rent,10642.79,NA
NA,NA,NA,Warehouse:NJ Warehouse Rent,7476.17,NA
NA,NA,NA,Warehouse:NJ Warehouse Rent,2337.86,NA
NA,NA,NA,Warehouse:NJ Warehouse Rent,3915.85,NA
NA,NA,NA,Warehouse:NJ Warehouse Rent,2878.78,NA
NA,NA,NA,Warehouse:NJ Warehouse Rent,3482.92,NA
12/01/2019,Bill,4953268,Accounts Payable,NA,173.8
NA,NA,NA,Warehouse:Warehouse Expense,173.8,NA
12/01/2019,Bill,198288,Accounts Payable,NA,750
NA,NA,NA,Office Expense:Accounting,750,NA
现在我只剩下数据工程了,我 不 知道该怎么做:用date
,{{ 1}} s和transaction type
s应该汇总到?
然后,num
和debit
将以整洁的方式“聚在一起”。
答案 0 :(得分:2)
一个选项是fill
,然后用pivot_longer
重塑为'long'格式
library(dplyr)
library(tidyr)
df1 %>%
fill(date, transaction_type, num) %>%
pivot_longer(cols = debit:credit,
names_to = 'type', values_to = 'credit_debit_value')
# A tibble: 22 x 6
# date transaction_type num account type credit_debit_value
# <chr> <chr> <chr> <chr> <chr> <dbl>
# 1 12/01/2019 Bill 4296-4301 Accounts Payable debit NA
# 2 12/01/2019 Bill 4296-4301 Accounts Payable credit 30734.
# 3 12/01/2019 Bill 4296-4301 Warehouse:NJ Warehouse Rent debit 10643.
# 4 12/01/2019 Bill 4296-4301 Warehouse:NJ Warehouse Rent credit NA
# 5 12/01/2019 Bill 4296-4301 Warehouse:NJ Warehouse Rent debit 7476.
# 6 12/01/2019 Bill 4296-4301 Warehouse:NJ Warehouse Rent credit NA
# 7 12/01/2019 Bill 4296-4301 Warehouse:NJ Warehouse Rent debit 2338.
# 8 12/01/2019 Bill 4296-4301 Warehouse:NJ Warehouse Rent credit NA
# 9 12/01/2019 Bill 4296-4301 Warehouse:NJ Warehouse Rent debit 3916.
#10 12/01/2019 Bill 4296-4301 Warehouse:NJ Warehouse Rent credit NA
# … with 12 more rows
df1 <- structure(list(date = c("12/01/2019", NA, NA, NA, NA, NA, NA,
"12/01/2019", NA, "12/01/2019", NA), transaction_type = c("Bill",
NA, NA, NA, NA, NA, NA, "Bill", NA, "Bill", NA), num = c("4296-4301",
NA, NA, NA, NA, NA, NA, "4953268", NA, "198288", NA), account =
c("Accounts Payable",
"Warehouse:NJ Warehouse Rent", "Warehouse:NJ Warehouse Rent",
"Warehouse:NJ Warehouse Rent", "Warehouse:NJ Warehouse Rent",
"Warehouse:NJ Warehouse Rent", "Warehouse:NJ Warehouse Rent",
"Accounts Payable", "Warehouse:Warehouse Expense", "Accounts Payable",
"Office Expense:Accounting"), debit = c(NA, 10642.79, 7476.17,
2337.86, 3915.85, 2878.78, 3482.92, NA, 173.8, NA, 750), credit = c(30734.37,
NA, NA, NA, NA, NA, NA, 173.8, NA, 750, NA)),
class = "data.frame", row.names = c(NA,
-11L))