使用Xpath&提取alt标签蟒蛇

时间:2017-01-08 13:07:23

标签: python html xpath

我试图提取' alt'下面代码块中图片的标记,周围div的类是' onIcon'。(示例= Modelcontract或Kabeltelevisie)



<tbody>
 <tr class="odd"><td><div class="roomdetail_icon onIcon Modelcontract"><a href="/nl/modelcontract"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_modelcontract_on.png" alt="Modelcontract" /></a></div></td><td><div class="roomdetail_icon onIcon Kamer"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_room_on.png" alt="Kamer" /></div></td><td><div class="roomdetail_icon offIcon Studio"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_studio_off.png" alt="Studio" /></div></td><td><div class="roomdetail_icon offIcon Appartement"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_apartment_off.png" alt="Appartement" /></div></td><td><div class="roomdetail_icon onIcon Internet"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_internet_on.png" alt="Internet" /></div></td> </tr>
 <tr class="even"><td><div class="roomdetail_icon onIcon Kabeltelevisie"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_cable_tv_on.png" alt="Kabeltelevisie" /></div></td><td><div class="roomdetail_icon onIcon Gemeenschappelijke leefruimte"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_shared_living_space_on.png" alt="Gemeenschappelijke leefruimte" /></div></td><td><div class="roomdetail_icon onIcon Tuin/terras"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_garden_on.png" alt="Tuin/terras" /></div></td><td><div class="roomdetail_icon onIcon Fietsenstalling"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_bicycle_shed_on.png" alt="Fietsenstalling" /></div></td><td><div class="roomdetail_icon offIcon Beddengoed"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_bedding_off.png" alt="Beddengoed" /></div></td> </tr>
 <tr class="odd"><td><div class="roomdetail_icon onIcon Keukengerei"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_kitchen_utensils_on.png" alt="Keukengerei" /></div></td><td><div class="roomdetail_icon offIcon Muziekinstrumenten toegelaten"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_musical_instruments_allowed_off.png" alt="Muziekinstrumenten toegelaten" /></div></td><td><div class="roomdetail_icon offIcon Roken niet toegelaten"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_smoking_allowed_off.png" alt="Roken niet toegelaten" /></div></td><td><div class="roomdetail_icon offIcon Huisdieren wel/niet toegelaten"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_animals_allowed_off.png" alt="Huisdieren wel/niet toegelaten" /></div></td><td><div class="roomdetail_icon offIcon Bemeubeld"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_furnished_off.png" alt="Bemeubeld" /></div></td> </tr>
 <tr class="even"><td><div class="roomdetail_icon offIcon Toegankelijk voor rolstoelgebruikers"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_wheelchair_accssible_off.png" alt="Toegankelijk voor rolstoelgebruikers" /></div></td><td><div class="roomdetail_icon offIcon Geschikt voor allergiepatienten"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_allergies_off.png" alt="Geschikt voor allergiepatienten" /></div></td><td><div class="roomdetail_icon offIcon Verhuur aan niet-studenten"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_non_students_off.png" alt="Verhuur aan niet-studenten" /></div></td><td><div class="roomdetail_icon offIcon Straatkant"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_street_off.png" alt="Straatkant" /></div></td><td><div class="roomdetail_icon onIcon Niet aan straatkant"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_notstreet_on.png" alt="Niet aan straatkant" /></div></td> </tr>
 <tr class="odd"><td><div class="roomdetail_icon onIcon Building regulations"><img src="/sites/all/themes/kotweb/images/icons/grid/grid_building_regulations_on.png" alt="Building regulations" /></div></td> </tr>
</tbody>
&#13;
&#13;
&#13;

我在Python中使用XPath并得到了以下查询:

'features': response.xpath("//div[@class='onIcon']//img/@alt").extract()

不幸的是,这会返回一个空数组([])。

我已经坚持了很长一段时间了:我做错了什么?

亲切的问候, 托马斯

1 个答案:

答案 0 :(得分:0)

.babelrc

该类的值为response.xpath("//div[@class[contains(., 'onIcon')]]//img/@alt") ,而不只是roomdetail_icon onIcon Modelcontract,您应该使用onIcon函数

contains表示当前上下文节点(@class)。

出:

.

每次执行['Modelcontract', 'Kamer', 'Internet', 'Kabeltelevisie', 'Gemeenschappelijke leefruimte', 'Tuin/terras', 'Fietsenstalling', 'Keukengerei', 'Niet aan straatkant', 'Building regulations'] 时,xpath都会执行以下步骤:

  1. XPath注意有字符串[@class='onIcon'],因此它会将'onIcon'转换为字符串,在这种情况下,可以比较这两者。
  2. 为了将@class转换为字符串,有一个@class函数,string()将返回类的值string(@class)
  3. 最后,XPath比较roomdetail_icon onIcon Modelcontract