如何让我的Xpath查询将其内容循环到新单元格中

时间:2012-07-30 12:29:37

标签: ruby xpath nokogiri

我试图让我的Xpath查询输出到新的单元格行,但我还没有成功。我正在尝试将输出放入列中逐行而不是行1列A,B,C

我的完整代码位于https://gist.github.com/3205801

最好是使用Axslx还是CSV标准?

#Set encoding options to remove nasty Trademark symbols
  encoding_options = {
    :invalid           => :replace,  # Replace invalid byte sequences
    :undef             => :replace,  # Replace anything not defined in ASCII
    :replace           => '',        # Use a blank for those replacements
    :universal_newline => true       # Always break lines with \n
  }

doc = Nokogiri::HTML(open("http://h10010.www1.hp.com/wwpc/ie/en/ho/WF06b/321957-321957-3329742-89318-89318-5186820-5231694.html?dnr=1"))
#For each break create a ;
doc.css('br').each{ |br| br.replace ';' }

clues = Array.new
clues << 'Operating system'
clues << 'Processors'

CSV.open("output.csv", "wb") do |csv|
  #1. Output the Clues header
  #2. Scrape the output/force encoding to remove special characters
    csv << clues
    csv << clues.map{|clue| doc.at("//td[text()='#{clue}']/following-sibling::td").text.strip.encode Encoding.find('ASCII'), encoding_options}
  #end loop
end

1 个答案:

答案 0 :(得分:0)

我不确定我是否理解这个问题,但我认为你想要这样的数据:

header1,value1
header2,value2
header3,value3

而不是:

header1,header2,header3
value1,value2,value3

如果这是真的,你可以这样做:

CSV.open("output.csv", "wb") do |csv|
  clues.each do |one_clue|
    csv << one_clue
    xpath = "//td[text()='#{one_clue}']/following-sibling::td"
    csv << doc.at(xpath).text.strip.encode Encoding.find('ASCII'), encoding_options
  end
end