我有一种搜索文本文件中的行并将其存储在基于单词列表的散列中的方法。
该方法做了两件简单的事情:
如果匹配,则使用“found”类别中的正则表达式存储该行,否则将结果存储在“unfound”类别中。
我的问题涉及“不完整”部分:每条线都进入非分类状态。我需要的是非分类交易只能是不在单词列表中的行。
这是我的单词表:
words_to_check = ['BUILDING','LAKE','TREE']
这是我的文字路径:
path_to_file = "/Users/name/Desktop/path_to_file"
文件内容示例:
07/08/2013,"BUILDING",,100.00
07/08/2013,"LAKE",,50.00
07/08/2013,"TREE",,5.50
07/08/2013,"CAT",,10.50
07/08/2013,"DOG",,-19.87
这是构建哈希的方法:
def build_hash(path_to_file, words_to_check)
trans_info = {
:found => {},
:unfound => {}
}
found = trans_info[:found]
unfound = trans_info[:unfound]
words_to_check.each do |word|
found[word] = []
unfound[:unfound] = []
File.foreach(path_to_file) do |line|
if line.include?(word)
date = /(?<Month>\d{1,2})\D(?<Day>\d{2})\D(?<Year>\d{4})/.match(line).to_s
transaction = /(?<transaction>)#{word}/.match(line).to_s
amount =/-+(?<dollars>\d+)\.(?<cents>\d+)/.match(line).to_s.to_f.round(2)
# found word on list now push to array with hash keys
found[word] << {
date: date,
transaction: transaction,
amount: amount
}
else
date = /(?<Month>\d{1,2})\D(?<Day>\d{2})\D(?<Year>\d{4})/.match(line).to_s
transaction = /(?<Middle>)".*"/.match(line).to_s
amount =/-*(?<dollars>\d+)\.(?<cents>\d+)/.match(line).to_s.to_f.round(2)
# push to unfound part of hash
unfound[:unfound] << {
date: date,
transaction: transaction,
amount: amount
}
end
end
end
#found and unfound key/values will be returned
return trans_info
end
如果你运行这个,你会看到'BUILDING','LAKE','TREE','CAT','DOG'在:unfound
。
只有'CAT'和'DOG'应该在:unfound
。
这可能看起来像一个简单的else
或条件逻辑,但我已经研究过并考虑过其他数据结构,但无法解决这个问题。任何建议或新想法都非常感谢!
答案 0 :(得分:0)
这与您设置循环的方式有关。由于您是单独检查每个单词,因此您基本上要求列表中的所有单词必须排成一行才能避免进入:unfound
类别。
举个例子,看一下数据文件的第一行。
07/08/2013,"BUILDING",,100.00
在第一次通过words_to_check.each
循环时,您将该行与列表中的第一个单词进行比较,即BUILDING
。这显然是匹配,因此该行被添加到:found
类别。但是,还有两个词要比较。在第二次循环中,您将同一行与单词LAKE
进行比较,因此匹配失败,并将该行添加到:unfound
类别。那么TREE
这个词也会发生同样的事情。现在程序最终继续检查下一行。
由于文件循环位于字循环内,因此您还必须多次读取该文件。由于读取文件非常慢,我会颠倒这些循环的顺序。也就是说,我将字循环放在里面。
你可能想要像这样构建你的循环:
File.foreach(path_to_file) do |line|
line_does_match = false # assume that we start without a match
words_to_check.each do |word| # check the current line against all words
if line.include? word
line_does_match = true # record that we have a match
break # stop the words_to_check.each loop
end
end
# Now that we've determined whether the line matches ANY of the
# words in the list we can deal with it accordingly.
if line_does_match
# add it to the :found list
else
# add it to the :unfound list
end
end