如何只用Nokogiri提取TD值?

时间:2016-09-01 23:52:31

标签: ruby nokogiri

我正在提取令人满意的日历。我可以puts td个值,但它会打印完整的HTML标记。如何使用“calendar__time”类获取标记的td值?

我的代码是:

require 'HTTParty' 
require 'Nokogiri' 
require 'Pry' 
require 'csv' 

page = HTTParty.get('http://www.forexfactory.com/calendar.php?day=aug31.2016') 
p= Nokogiri::HTML(page) 
rows=p.css('tr.calendar_row')
rows.map do |row|

  puts row.css('td.calendar__date')
    puts row.css('td.calendar__time')
end

当我用irb检查时,它会返回标签:

     <td class="calendar__cell calendar__time time">9:45pm</td><a href="javascript:void(0);" class="calendarexpanded__graph" data-touchable><span>Graph</span></a> </td>
            </tr>
         </tbody>
      </table>
   </td>
</tr>
    <td class="calendar__cell calendar__date date"></td>

此TR的HTML代码段为:

<tr class="calendar__row calendar_row calendar__row--grey " data-eventid="62529" data-touchable>
   <td class="calendar__cell calendar__date date"></td>
   <td class="calendar__cell calendar__time time">2:00am</td>
   <td class="calendar__cell calendar__currency currency">CHF</td>
   <td class="calendar__cell calendar__impact impact calendar__impact calendar__impact--low">
      <div class="calendar__impact-icon calendar__impact-icon--screen"> <span title="Low Impact Expected" class="low"></span> </div>
      <div class="calendar__impact-icon calendar__impact-icon--print"> <img src="resources/images/icons/impact/impact-yellow.png" alt="" border="0" /> </div>
   </td>
   <td class="calendar__cell calendar__event event">
      <div> <span class="calendar__event-title">UBS Consumption Indicator</span> </div>
   </td>
   <td class="calendar__cell calendar__detail detail"><a class="calendar__detail-link calendar__detail-link--level-1 calendar_detail level1" data-level="1"></a></td>
   <td class="calendar__cell calendar__actual actual">1.32</td>
   <td class="calendar__cell calendar__forecast forecast"></td>
   <td class="calendar__cell calendar__previous previous"><span class="revised worse" title="Revised From 1.34">1.21</span></td>
   <td class="calendar__cell calendar__graph graph"><a class="calendar__detail-link calendar__detail-link--graph-icon calendar_chart"></a></td>
</tr>
<tr class="calendar__row calendar__expand  " data-eventid="62529">
   <td>&nbsp;</td>
   <td colspan="4" class="calendarexpanded__container">
      <table class="calendarexpanded">
         <tbody>
            <tr>
               <td class="calendarexpanded__cell"><strong>Actual</strong>1.32</td>
               <td class="calendarexpanded__cell"><strong>Forecast</strong>&nbsp;</td>
               <td class="calendarexpanded__cell"><strong>Previous</strong><span class="revised worse" title="Revised From 1.34">1.21</span></td>
               <td class="calendarexpanded__cell calendarexpanded__cell--small"> <a href="javascript:void(0);" class="calendarexpanded__details calendarexpanded__details--1" data-touchable><span>Details</span></a> </td>
               <td class="calendarexpanded__cell calendarexpanded__cell--small"> 

1 个答案:

答案 0 :(得分:0)

我按照以下方式改变了工作:

puts row.css('td.calendar__date').text