循环以在同一页面上刮取多个元素,同时单独存储它们

时间:2014-06-12 11:29:50

标签: xpath scrapy

我希望在使用Scrapy

时从单个页面中抓取多个产品名称
<!-- body_text //-->

    <td width="601" valign="top">

      <table border="0" width="100%" cellspacing="0" cellpadding="0">

        <tr>

          <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

        </tr>

       <tr>

         <td class="pageHeading">Pool (Pocket Billiards) Table</td>

        </tr>

        <tr>

          <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

        </tr>

        <tr>

          <td class="main">A Victoria table is more than mere wood and slate. By paying attention to the details - the hidden differences - Victoria tables have become known name as masterpieces of original design and craftmanship, and most prestigious name in billiards.<br><br>



          These tables, available in two sizes  9’ X 4.5’ and 8’ X 4’, are made of frames with selected good quality solid wood and finely crafted rose wood legs with Mahagony polish.<br><br>

Slate Beds used are either Indian Bangalore Black Slate or Imported Slate. Slates are covered with worsted wool cloth optionally from Jupiter (China) or Strachan (West of England cloth, U.K.) to have proper speed, accuracy and responsiveness of the table to spin. Chrome nuts and adjusters  are used for leveling. It is surrounded with standard imported vulcanized 'L' shaped or 'V' shaped rubber cushions or Northern Cushions (Made in England) to cause billiard balls to rebound while minimizing the lose of kinetic energy.</td>

        </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs20b"></a>VS-20B</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>Size: 9&lsquo; X 4.5&lsquo;</strong></li><li>Rose Wood Legs</li><li>Mahgony Polish</li><li>S.B. Frame</li><li><strong>Bangalore Slate</strong></li><li>Standard Accessories</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-20bbig.jpg')"><img src="images/products/vs-20b.jpg" alt="VS-20B" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs20b"></a>VS-20C</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>Size: 8&lsquo; X 4&lsquo;</strong></li><li>Rose Wood Legs</li><li>Mahgony Polish</li><li>S.B. Frame</li><li><strong>Bangalore Slate</strong></li><li>Standard Accessories</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-20cbig.jpg')"><img src="images/products/vs-20c.jpg" alt="VS-20C" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs23b"></a>VS-23B</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>Size: 9&lsquo; X 4.5&lsquo;</strong></li><li>Rose Wood Legs</li><li>Mahgony Polish</li><li>S.A.L. Frame</li><li><strong>Imported Slate</strong></li><li>Standard Accessories</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-23bbig.jpg')"><img src="images/products/vs-23b.jpg" alt="VS-23B" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs23b"></a>VS-23C</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>Size: 8&lsquo; X 4&lsquo;</strong></li><li>Rose Wood Legs</li><li>Mahgony Polish</li><li>S.A.L. Frame</li><li><strong>Imported Slate</strong></li><li>Standard Accessories</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-23cbig.jpg')"><img src="images/products/vs-23c.jpg" alt="VS-23C" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs9"></a>VS-9</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>Size: 9&lsquo; X 4.5&lsquo;</strong></li><li>Auto Ball Return System</li><li>Pro Speed Cloth</li><li>American Pocket Size</li><li>Standard Accessories</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-9big.jpg')"><img src="images/products/vs-9.jpg" alt="VS-9" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs7"></a>VS-7</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>PLAYING AREA : 88" X 44"</strong></li><li><strong>OVERALL SIZE : 98"L X 54" W X 31" H</strong></li><li>Solid oak for top/brand rails, Dark cherry finish</li><li>Rams head solid rubber wood with # 6 leather drop pocket.  Easy assembly</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-7big.jpg')"><img src="images/products/vs-7.jpg" alt="VS-7" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs8"></a>VS-8/Light Oak</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>POOL TABLE : 8&lsquo; X 4&lsquo;</strong></li><li><strong>PLAYING AREA : 88" X 44"</strong></li><li><strong>OVERALL SIZE : 98" X 54"W X 31"H</strong></li><li>Solid oak for top/brand rails, Light oak finish</li><li>Rams head solid rubber wood with # 6 leather drop pocket, Easy assembly</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-8big.jpg')"><img src="images/products/vs-8.jpg" alt="VS-8/Light Oak" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs12"></a>VS-12</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>POOL TABLE : 8&lsquo; X 4&lsquo;</strong></li><li><strong>PLAYING AREA : 88" X 44"</strong></li><li><strong>OVERALL SIZE : 99-3/4"L X 55 - 3/4" W X 31" H</strong></li><li>Black laminate, pedestal legs, with drop pocket, Steel frame Easy assembly. Accessories included.</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-12big.jpg')"><img src="images/products/vs-12.jpg" alt="VS-12" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs10"></a>VS-10</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>POOL TABLE : 8&lsquo; X 4&lsquo;</strong></li><li><strong>PLAYING AREA : 88" X 44"</strong></li><li><strong>OVERALL SIZE : 98" L X 54"W X 31"H</strong></li><li>Solid oak for top/brand rails, oak finish</li><li>Rams head solid rubber wood with # 6 leather drop pocket, Easy assembly</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-10big.jpg')"><img src="images/products/vs-10.jpg" alt="VS-10" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs11"></a>VS-11</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>POOL TABLE : 8&lsquo; X 4&lsquo;</strong></li><li><strong>PLAYING AREA : 88" X 44"</strong></li><li><strong>OVERALL SIZE : 100" X 56"</strong></li><li>Solid wood for top/brand rails</li><li>Mahogany finish</li><li>Rams head solid rubber with # 6 leather drop pocket</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-11big.jpg')"><img src="images/products/vs-11.jpg" alt="VS-11" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs13"></a>VS-13</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>POOL TABLE : 8&lsquo; X 4&lsquo;</strong></li><li><strong>PLAYING AREA : 88" X 44"</strong></li><li><strong>OVERALL SIZE : 100" X 56"</strong></li><li>Solid wood for top/brand rails,</li><li>Dark cherry finish</li><li>Rams head solid rubber wood<br />
<br />
with # 6 leather drop pocket</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-13big.jpg')"><img src="images/products/vs-13.jpg" alt="VS-13" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>


            <tr>

          <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

        </tr>

        <tr>

          <td>

            <table cellpadding="4" cellspacing="0" width="100%" border="0">

              <tr>

                <td width="50%" valign="top" class="product_name1" colspan="2"><strong>Standard Accessories for Pool</strong></td>

              </tr>

            </table>

            <table cellpadding="4" cellspacing="4" width="100%" border="0" class="product_box1">

              <tr>

                <td width="50%" valign="top" class="product_text">

                <ul>

                  <li>Aramith Pool Ball 2.1/4" or 2.1/16"</li>

                  <li>Table Brush</li>

                  <li>60" Rest Stick C/W Brass Cross Head Rest</li>

                  <li>Wall Cue Rack</li>

                </ul></td>

                <td width="50%" valign="top" class="product_text">

                <ul>

                  <li>Plastic Triangle</li>

                  <li>Triangle Chalk X 12 Pcs.</li>

                  <li>Pool House Cue X 4 Pcs.</li>

                  <li>Table Cover</li>

                  <li>Round Type Lamp Shade X 2 Pcs.</li>

                </ul></td>

              </tr>

            </table>

          </td>                 

        </tr>

    </table></td>

<!-- body_text_eof //-->

     <td width="45" valign="top">

      <table border="0" width="45" cellspacing="0" cellpadding="0">

<!-- right_navigation //-->

正如您从代码中看到的那样,我想要在xpath中填充的字段是:td[@class='product_name']/strong/a/@name

我还需要从这个xpath中提取图像:rd[@align='center']/a/img/@src

我以CSV格式导出数据,目前我的刮刀将所有产品名称存储在一个单元格中。我试图让它将每个产品名称和图像URL分别存储在我的CSV中的单个单元格中。

我尝试使用循环,但无法使其工作
我的代码:

  def parse(self, response):
   hxs = HtmlXPathSelector(response)  
   titles = hxs.select("//head")
   items = []
   item = item()

   for i in range(0,5):

     item ["productname"] = titles.select("//td[@class='product_name'][i]/strong").extract()
     item ["imgurl"] = titles.select("//td[@align='center'][i]/a/img/@src").extract()


     items.append(item)
     return(items)

2 个答案:

答案 0 :(得分:3)

names = hxs.xpath('//td[@class="product_name"]/strong/text()')
imageurls = hxs.xpath('//tr/td[@align="center"]/a/img/@src')
for name, url in zip(names, imageurls):
    item["productname"] = name
    item["imgurl"] = url
    yield item

最简单的方法,因为名称和图片网址的顺序在提取时会相互对应。

答案 1 :(得分:0)

您不需要逐个选择元素(通过更改循环中的 i 索引)。下面的路径表达式:

//td[@class='product_name']/strong/a/@name

已经返回包含两个项的节点集。您只需循环返回的元素以提取每个属性字符串。

关于第二个表达:

//td[@align='center']/a/img/@src

只有一个匹配,您可以直接提取字符串。