如何通过解析文本文件仅显示不匹配的结果?

时间:2013-10-08 22:21:43

标签: ruby-on-rails ruby

我有一种搜索文本文件中的行并将其存储在基于单词列表的散列中的方法。

该方法做了两件简单的事情:

如果匹配,则使用“found”类别中的正则表达式存储该行,否则将结果存储在“unfound”类别中。

我的问题涉及“不完整”部分:每条线都进入非分类状态。我需要的是非分类交易只能是不在单词列表中的行。

这是我的单词表:

words_to_check = ['BUILDING','LAKE','TREE']

这是我的文字路径:

path_to_file = "/Users/name/Desktop/path_to_file" 

文件内容示例:

07/08/2013,"BUILDING",,100.00
07/08/2013,"LAKE",,50.00
07/08/2013,"TREE",,5.50
07/08/2013,"CAT",,10.50
07/08/2013,"DOG",,-19.87

这是构建哈希的方法:

def build_hash(path_to_file, words_to_check)
  trans_info = {
    :found => {},
    :unfound => {}
  }

  found = trans_info[:found]
  unfound = trans_info[:unfound]

  words_to_check.each do |word|
    found[word] = []
    unfound[:unfound] = []

      File.foreach(path_to_file) do |line|              
        if line.include?(word)
      date = /(?<Month>\d{1,2})\D(?<Day>\d{2})\D(?<Year>\d{4})/.match(line).to_s
      transaction = /(?<transaction>)#{word}/.match(line).to_s
      amount =/-+(?<dollars>\d+)\.(?<cents>\d+)/.match(line).to_s.to_f.round(2)

          # found word on list now push to array with hash keys
      found[word] << { 
        date: date, 
        transaction: transaction, 
        amount: amount 
      }

        else

      date = /(?<Month>\d{1,2})\D(?<Day>\d{2})\D(?<Year>\d{4})/.match(line).to_s
      transaction = /(?<Middle>)".*"/.match(line).to_s
      amount =/-*(?<dollars>\d+)\.(?<cents>\d+)/.match(line).to_s.to_f.round(2)     

      # push to unfound part of hash
          unfound[:unfound] << { 
        date: date, 
        transaction: transaction, 
        amount: amount
      } 

       end
      end
   end
    #found and unfound key/values will be returned
  return trans_info
 end

如果你运行这个,你会看到'BUILDING','LAKE','TREE','CAT','DOG'在:unfound。 只有'CAT'和'DOG'应该在:unfound

这可能看起来像一个简单的else或条件逻辑,但我已经研究过并考虑过其他数据结构,但无法解决这个问题。任何建议或新想法都非常感谢!

1 个答案:

答案 0 :(得分:0)

这与您设置循环的方式有关。由于您是单独检查每个单词,因此您基本上要求列表中的所有单词必须排成一行才能避免进入:unfound类别。

举个例子,看一下数据文件的第一行。

07/08/2013,"BUILDING",,100.00

在第一次通过words_to_check.each循环时,您将该行与列表中的第一个单词进行比较,即BUILDING。这显然是匹配,因此该行被添加到:found类别。但是,还有两个词要比较。在第二次循环中,您将同一行与单词LAKE进行比较,因此匹配失败,并将该行添加到:unfound类别。那么TREE这个词也会发生同样的事情。现在程序最终继续检查下一行。

由于文件循环位于字循环内,因此您还必须多次读取该文件。由于读取文件非常慢,我会颠倒这些循环的顺序。也就是说,我将字循环放在里面。

你可能想要像这样构建你的循环:

File.foreach(path_to_file) do |line|
  line_does_match = false # assume that we start without a match
  words_to_check.each do |word| # check the current line against all words
    if line.include? word
      line_does_match = true # record that we have a match
      break # stop the words_to_check.each loop
    end
  end
  # Now that we've determined whether the line matches ANY of the 
  # words in the list we can deal with it accordingly.
  if line_does_match
    # add it to the :found list
  else
    # add it to the :unfound list
  end
end