Question

如何停止此代码输出中的重复项。

RE = /<("[^"]*"|'[^']*'|[^'">])*>/
TAG_RE = /<(.+?)>(.*?)<.+?>/

text = "<date>show</date> me the current conditions for <city> detroit <END>"
a = []

text.scan(TAG_RE).map { |w| a<< w; }

text.gsub(RE, '').split.each do |q|
    a.each_with_index do |v, i|
        if q == a[i].last.strip
            puts "#{q}\tB-#{a[i].first}"        
        else
            puts "#{q}\tO"          
        end

    end
end

OUTPUTS

show    B-date
show    O
me  O
me  O
the O
the O
current O
current O
conditions  O
conditions  O
for O
for O
detroit O
detroit B-city

我只想要符合条件

的单个单词实例

喜欢这个

show    B-date
me  O
the  O
current   O
conditions   O
for  O
detroit B-city

我可以在哪里将next放入循环中？

EDIT
这段代码 Rubyiotic？

text.gsub(RE, '').split.each do |q|
    a.each_with_index do |v, i|
        @a = a[i].last.strip # save in a variable    
        if @a == q
            puts "#{q}\tB-#{a[i].first}"    
            break # break inner loop if match found
        end
    end
    next if @a == q #skip current outer loop if match found
    puts "#{q}\tO"  
end

Answer 1

问题是你还在迭代你的a，这实际上是标签和单词之间的哈希。

如果您将scan hash视为array，而不是RE = /<("[^"]*"|'[^']*'|[^'">])*>/ TAG_RE = /<(.+?)>(.*?)<.+?>/ text = "<date>show</date> me the current conditions for <city> detroit <END>" a = text.scan(TAG_RE) text.gsub(RE, '').split.each do |q| d = a.find { |p| p.last.strip == q } if d puts "#{q}\tB-#{d.first}" else puts "#{q}\tO" end end，那么您就不会重复。

show    B-date
me      O
the     O
current O
conditions      O
for     O
detroit B-city

输出：

hash

而且，虽然我们正在使用它，但您可以使用正确的RE = /<("[^"]*"|'[^']*'|[^'">])*>/ TAG_RE = /<(.+?)>(.*?)<.+?>/ text = "<date>show</date> me the current conditions for <city> detroit <END>" map = Hash[*text.scan(TAG_RE).flatten.map(&:strip)].invert text.gsub(RE, '').split.each do |q| tag = map[q] if tag puts "#{q}\tB-#{tag}" else puts "#{q}\tO" end end：

class Text
  TAGS_RE = /<("[^"]*"|'[^']*'|[^'">])*>/
  TAGS_WORDS_RE = /<(.+?)>\s*(.*?)\s*<.+?>/

  def self.strip_tags(text)
    text.gsub(TAGS_RE, '')
  end

  def self.tagged_words(text)
    matches = text.scan(TAGS_WORDS_RE)
    Hash[*matches.flatten].invert
  end
end

class Word
  def self.display(word, tag)
    puts "#{word}\t#{Word.tag(tag)}"
  end

  private

  def self.tag(tag)
    tag ? "B-#{tag}" : "0"
  end
end

text = "<date>show</date> me the current conditions for <city> detroit <END>"

words_tag = Text.tagged_words(text)
Text.strip_tags(text).split.each do |word|
  tag = words_tag[word]
  Word.display(word, tag)
end

生成相同的输出。

编辑：如果你想用更多的Ruby- esque 方式，我可能会做这样的事情：

break

为什么？

我不那么聪明，我很懒，所以我更喜欢尽可能明确地写东西。所以，我尽量避免循环。

编写循环很简单，但读取循环并不容易，因为在继续阅读和解析源代码时，必须保留所读内容的上下文。

通常，next和words_tag = Text.tagged_words(text) Text.strip_tags(text).split.each do |word| tag = words_tag[word] Word.display(word, tag) end s的循环更难解析，因为您必须跟踪哪些代码路径突然结束循环。

嵌套周期更加困难，因为您必须跟踪多个以不同速度变化的上下文。

我相信建议的版本更容易阅读，因为每一行都可以在它自己的基础上理解。从一行到下一行，我们必须记住很少的背景。

详细信息是用方法抽象的，所以如果你只想了解大局，你可以看一下代码的主要部分：

{{1}}

如果您想了解有关其完成方式的详细信息，请查看这些方法的实现方式。使用这种方法，实现细节不会泄漏到可能不需要它们的地方。

我认为这是每种编程语言的好习惯，而不仅仅是Ruby。

Ruby循环输出重复

1 个答案: