Nokogiri捕获正确的css选择器但选择器更改

时间:2016-02-03 21:40:50

标签: ruby css-selectors nokogiri

我正在编写一个程序,将从网络上提取我们的打印机信息并输出重要内容。

我有一些css根据打印机的维护/碳粉的不同而变化,但我需要做的是捕获Toner的css而不是Maintenance的css

我已使用此代码成功捕获了信息: print "Toner left: ", page.css('.hpConsumableBlockHeaderText')[1].text, "\n"

此问题仅捕获36%而不是26%

实施例: Maintenance Toner

请注意,两者都在同一个span,我失去了如何捕获一个而不是另一个?

用法示例:

[]$ ruby clean_printer laser15
Toner left: 
Maintenance Kit����31%
110V-Q5421A, 220V-Q5422A

[]$ 

来源(有些信息是为了安全起见):

#!/usr/local/bin/ruby

require 'colored'
require 'nokogiri'
require 'restclient'

class CleanPrinter

  attr_accessor :printer, :amount

  def initialize(printer, amount)
    @printer = printer
    @amount = amount.to_i
  end

  def check_argv
    if ARGV[0] == nil || ARGV[1] == nil
      puts <<-EOF

      USAGE: clean_printer <printer-name> <number-of-copies>
      EOF
      .yellow.bold
    else
      send_print_jobs
    end
  end

  def create_jobs
    system("lp -d #{@printer} test.txt")
  end

  def send_print_jobs
    @amount.times do
      create_jobs
    end
  end

  def parse_4100
    page = Nokogiri::HTML(RestClient.get("#{@printer}.com"))
    #page.css('font').each_with_index { |e,i| puts "Matched at #{i}" if e.text =~ /6%/ } <= Used to find the correct selector
    print "Toner left: ", page.css('font')[28].to_s[/\d[%]/], "\n"
    powersave = page.css('td')[9].to_s[/(?<=POWERSAVE\ )\w+(?=<)/]
    powersave == "ON" ? (puts "Powersave Mode: ON") : (puts "Powersave Mode: OFF")
  end

  def parse_4350
    page = Nokogiri::HTML(RestClient.get("#{@printer}.com/hp/device/this.LCDispatcher"))
    #page.css('hpConsumableBlockHeaderText').each_with_index { |e,i| puts "Matched at #{i}" if e.text =~ /26%/ }
    print "Toner left: ",  page.css('.hpConsumableBlockHeaderText')[1].text, "\n"
  end

  def parse_brother
  end
end

mr_clean = CleanPrinter.new(ARGV[0], ARGV[1])
mr_clean.parse_4350

更新

发现使用此正则表达式:[/\d{1,3}[%]/]将从维护中捕获31%

[]$ ruby clean_printer laser15
Toner left: 31%
[]$ 

1 个答案:

答案 0 :(得分:0)

大概page.css('.hpConsumableBlockHeaderText')会返回两个元素,但是你正在使用page.css('.hpConsumableBlockHeaderText')[1],它会返回第二个元素,当你想要第一个元素时。试试这个:

page.css('.hpConsumableBlockHeaderText')[0]