我有一个文本文件,其中包含公司的资产负债表信息。问题是间距不均匀,我得到这样的数据
28/07/15 2.85 104,689.13
30/07/15 31,862.00 136,551.13
原因是第一行2.85是借方,第二是贷方。
如何获取ruby中的数据,以便从第一行中获得信用额度为第二行并从第二次付款时获取4行元素。
我可以根据多个空格拆分数据,然后比较连续行之间的平衡以获得信用卡和借记卡信息,但我想知道是否有更好的方法(可能是正则表达式)来执行此操作。
谢谢。
答案 0 :(得分:2)
这是一种即使线条真的搞砸也会起作用的方式。它依赖于借方(贷方)减少(增加)借方金额(贷方)的余额这一事实。让我们先写一些数据到文件:
data =<<_
28/07/15 2.85 104,689.13
30/07/15 31,862.00 136,551.13
28/07/15 1.13 136,550.00
30/07/15 10,000.01 146,550.01
_
FName = 'temp'
IO.write(FName, data)
#=> 288
以下是提取字段的方法。它需要文件名和起始余额。或者,第二个参数可以是一个布尔值,指示第一行是包含借方还是贷方。
require 'bigdecimal'
def extract_transactions(fname, starting_balance)
transactions = []
IO.readlines(FName).reduce(BigDecimal.new(starting_balance)) do |start_bal,s|
date, debit_or_credit, bal = s.strip.delete(',').split(/\s+/)
h = { date: date, debit: '', credit: '', balance: bal }
if BigDecimal.new(bal) == start_bal - BigDecimal.new(debit_or_credit)
h[:debit] = debit_or_credit
else
h[:credit] = debit_or_credit
end
transactions << h
BigDecimal.new(bal)
end
transactions
end
让我们试一试:
extract_debits_and_credits(FName, "104691.98")
#=> [{:date=>"28/07/15", :debit=>"2.85", :credit=>"", :balance=>"104689.13"},
# {:date=>"30/07/15", :debit=>"", :credit=>"31862.00", :balance=>"136551.13"},
# {:date=>"28/07/15", :debit=>"1.13", :credit=>"", :balance=>"136550.00"},
# {:date=>"30/07/15", :debit=>"", :credit=>"10000.01", :balance=>"146550.01"}]
我使用BigDecimal来避免出现舍入错误的问题。
Enumerable#reduce(又名inject
)在每笔交易(行)后更新余额(start_bal
,最初为starting_balance
)。
修改:这是非BigDecimal
变体(更好):
def extract_transactions(fname, debit_first)
curr_bal = (debit_first ? Float::INFINITY : -Float::INFINITY)
IO.readlines(FName).each_with_object([]) do |s, transact|
date, debit, bal = s.strip.split(/\s+/)
credit = ''
bal_float = bal.delete(',').to_f
(debit, credit = credit, debit) if bal_float > curr_bal
transact << { date: date, debit: debit, credit: credit, balance: bal }
curr_bal = bal_float
end
end
extract_transactions(FName, true)
#=> [{:date=>"28/07/15", :debit=>"2.85", :credit=>"", :balance=>"104689.13"},
# {:date=>"30/07/15", :debit=>"", :credit=>"31862.00", :balance=>"136551.13"},
# {:date=>"28/07/15", :debit=>"1.13", :credit=>"", :balance=>"136550.00"},
# {:date=>"30/07/15", :debit=>"", :credit=>"10000.01", :balance=>"146550.01"}]
答案 1 :(得分:1)
唯一的常量是字符串长度(71,但72除以4,因此是正确的值)。我们可能会尝试使用它:
▶ data = %q|28/07/15 2.85 104,689.13
▷ 30/07/15 31,862.00 136,551.13|
▶ data.split($/).map do |line|
▷ # ⇓⇓ ≡ string length + 1 / amount of items
▷ line.split(//).each_slice(18).map(&:join).map(&:strip)
▷ end
#⇒ [
# [0] [
# [0] "28/07/15",
# [1] "2.85",
# [2] "",
# [3] "104,689.13"
# ],
# [1] [
# [0] "30/07/15",
# [1] "",
# [2] "31,862.00",
# [3] "136,551.13"
# ]
# ]
答案 2 :(得分:1)
如何在ruby中获取数据,以便从行中获取4个元素 信用证首先是空的,第二个是借记卡 我想知道是否有更好的方法(也许是正则表达式)
我给了最后4列
我将假设以下列宽:
+--------------+--------+----------------------------+------------------+-----------------+
|Previous col | DATE | DEBIT | CREDIT | BALANCE |
| (any width) | width=8| (width=28) | (width=18) | (width=17) |
+--------------+--------+----------------------------+------------------+-----------------+
| ... |28/07/15| 2.85 | | 104,689.13|
| ... |30/07/15| | 31,862.00 | 136,551.13|
+--------------+--------+----------------------------+------------------+-----------------+
如果您考虑一下,我们可以将最后一列的整个宽度与/.{17}$/
相匹配。这里的技巧是使用lookahead来捕获字段的值,从行的17个字符位置到行尾的左边,然后向前移动:
/(?=[ ]{0,16}([\d,.]+)).{17}$/
Credit 是上一列,其宽度为18个字符/.{18}/
,但由于它是一个可选字段,我们需要将前瞻括在一个可选组中。如果我们将此模式作为前一个正则表达式的前缀,我们现在有:
/(?:(?=[ ]{0,17}([\d,.]+)))?.{18}(?=[ ]{0,16}([\d,.]+)).{17}$/
我们使用相同的逻辑将所有4个字段填入此单行正则表达式(在下面的代码中分解):
/(?=[ ]{0,7}(?<date>[\d\/]+)).{8}(?:(?=[ ]{0,27}(?<debit>[\d,.]+)))?.{28}(?:(?=[ ]{0,17}(?<credit>[\d,.]+)))?.{18}(?=[ ]{0,16}(?<balance>[\d,.]+)).{17}[ ]*$/
<强> regex101 DEMO 强>
data = %q|28/07/15 2.85 104,689.13
30/07/15 31,862.00 136,551.13|
regex = /
(?=[ ]{0,7}(?<date>[\d\/]+)) # Field 1: date
.{8} # column 1 (width=8)
#
(?:(?=[ ]{0,27}(?<debit>[\d,.]+)))? # Field 2: debit (optional)
.{28} # column 2 (width=28)
#
(?:(?=[ ]{0,17}(?<credit>[\d,.]+)))? # Field 3: credit (optional)
.{18} # column 3 (width=18)
#
(?=[ ]{0,16}(?<balance>[\d,.]+)) # Field 4: balance
.{17} # column 4 (width=17)
#
[ ]*$ # optional spaces -> EoL
/x
# hash from all named captures from all matches
result = data.scan(regex).collect do |match| Hash[regex.names.zip(match)] end
p result
#=> [{"date"=>"28/07/15", "debit"=>"2.85", "credit"=>nil, "balance"=>"104,689.13"},
# {"date"=>"30/07/15", "debit"=>nil, "credit"=>"31,862.00", "balance"=>"136,551.13"}]