获取空值,即使在scrapy shell中该值也不为空

时间:2019-10-25 14:44:28

标签: python web-scraping scrapy

即使在刮板外壳中该值不为空,我也得到了空值

我尝试抓取此链接:https://www.chemicalbook.com/ProductChemicalPropertiesCB2909992_EN.htm 并获取我正在使用的产品名称:

response.css('.ProdSupplierGN_ProductA_2 .td1+ td a::text').get()

但是当我看到输出scrapy.json时,我的数据是:

{
   "link":"https://www.chemicalbook.com/ProductChemicalPropertiesCB2909992_EN.htm",
   "name":null,
   "cas":null,
   "synomym":[

   ],
   "molecular_formula":null,
   "molecular_weight":null,
   "einecs":null,
   "product_categories":[

   ],
   "melting_point":null,
   "vapor_pressure":[

   ],
   "form":null,
   "henry_law_constant":null,
   "stability":null,
   "inchikey":null,
   "hazard_codes":null,
   "risk_statements":null,
   "safety_statements":null,
   "wgk":null,
   "tsca":null,
   "packing_group":null,
   "hs_code":null,
   "hazardous_substance_data":null,
   "chemical_properties":null,
   "definition":null,
   "air_and_water_reactions":null,
   "general_description":null,
   "reactivity_profile":null,
   "fire_hazard":null
}
    def parse_chemi_link(self, response):
        items = ChemibookItem()

        #------------------------------BASIC INFORMATION    
        link = response.url
        name = response.css('.ProdSupplierGN_ProductA_2 .td1+ td a::text').get()
        synomym = response.css('.ProdSupplierGN_ProductA_2+ .ProdSupplierGN_ProductA_2 td+ td font::text').getall()
        items['link'] = link
        items['name'] = name
        items['synomym'] = synomym
        yield items


1 个答案:

答案 0 :(得分:0)

我强烈建议您为此任务使用XPath表达式(因为您可以参考锚文本):

response.xpath('string(//td[.="Product Name:"]/following-sibling::td[1])').get()