如果在Ruby中使用`.match`在段落中找到字符串,则将字符串附加到数组

时间:2016-08-12 02:34:38

标签: arrays ruby regex

我试图为一个数组中的每个单词搜索一个段落,然后输出一个只包含可以找到的单词的新数组。

但到目前为止,我还无法获得所需的输出格式。

ID    Date          Var  
01    21/01/2016    1  
01    22/01/2016    1  
02    13/05/2016    2  
02    14/05/2016    2  
03    08/06/2016    4  
03    08/06/2016    4  

目前我得到的输出是打印单词的垂直列表。

paragraph = "Japan is a stratovolcanic archipelago of 6,852 islands.
The four largest are Honshu, Hokkaido, Kyushu and Shikoku, which make up about ninety-seven percent of Japan's land area.
The country is divided into 47 prefectures in eight regions."

words_to_find = %w[ Japan archipelago fishing country ]

words_found = []

words_to_find.each do |w|
    paragraph.match(/#{w}/) ? words_found << w : nil
end

puts words_found

但我想要的是Japan archipelago country

我没有多少经验来匹配段落中的文字,我不确定我在这里做错了什么。谁能提供一些指导?

2 个答案:

答案 0 :(得分:0)

这是因为您使用puts来打印数组的元素。将"\n"附加到每个元素的末尾&#34; word&#34;:

#!/usr/bin/env ruby
def run_me



    paragraph = "Japan is a stratovolcanic archipelago of 6,852 islands.
    the four largest are Honshu, Hokkaido, Kyushu and Shikoku, which make up about ninety-seven percent of Japan's land area.
    the country is divided into 47 prefectures in eight regions."

    words_to_find = %w[ Japan archipelago fishing country ]


    find_words_from_a_text_file paragraph , words_to_find



end



def  find_words_from_a_text_file( paragraph  , *words_to_find )
    words_found = []

    words_to_find.each do |w|
              paragraph.match(/#{w}/) ? words_found << w : nil
    end

    #  print array with enum .  
    words_found.each { |x| puts "with enum and puts : : #{x}" }

    # or just use "print , which does not add anew line"
    print "with print :"; print  words_found "\n"

    # or with p
    p words_found

end


run_me

输出:

za:ruby_dir za$ ./fooscript.rb 
with enum and puts : : ["Japan", "archipelago", "fishing", "country"]
with print :[["Japan", "archipelago", "fishing", "country"]]

答案 1 :(得分:0)

以下是两种方法。两者都是无关紧要的。

使用正则表达式

r = /
    \b                               # Match a word break
    #{ Regexp.union(words_to_find) } # Match any word in words_to_find
    \b                               # Match a word break
    /xi                              # Free-spacing regex definition mode (x)
                                     # and case-indifferent (i)
  #=> /
  #   \b                             # Match a word break
  #   (?-mix:Japan|archipelago|fishing|country) # Match any word in words_to_find
  #   \b                             # Match a word break
  #   /ix                            # Free-spacing regex definition mode (x)
                                     # and case-indifferent (i)

paragraph.scan(r).uniq(&:itself)
  #=> ["Japan", "archipelago", "country"]  

相交两个数组

words_to_find_hash = words_to_find.each_with_object({}) { |w,h| h[w.downcase] = w }
  #=> {"japan"=>"Japan", "archipelago"=>"archipelago", "fishing"=>"fishing",
       "country"=>"country"}  

words_to_find_hash.values_at(*paragraph.delete(".;:,?'").
                               downcase.
                               split.
                               uniq & words_to_find_hash.keys)
  #=> ["Japan", "archipelago", "country"]