即使在刮板外壳中该值不为空,我也得到了空值
我尝试抓取此链接:https://www.chemicalbook.com/ProductChemicalPropertiesCB2909992_EN.htm
并获取我正在使用的产品名称:
response.css('.ProdSupplierGN_ProductA_2 .td1+ td a::text').get()
但是当我看到输出scrapy.json
时,我的数据是:
{
"link":"https://www.chemicalbook.com/ProductChemicalPropertiesCB2909992_EN.htm",
"name":null,
"cas":null,
"synomym":[
],
"molecular_formula":null,
"molecular_weight":null,
"einecs":null,
"product_categories":[
],
"melting_point":null,
"vapor_pressure":[
],
"form":null,
"henry_law_constant":null,
"stability":null,
"inchikey":null,
"hazard_codes":null,
"risk_statements":null,
"safety_statements":null,
"wgk":null,
"tsca":null,
"packing_group":null,
"hs_code":null,
"hazardous_substance_data":null,
"chemical_properties":null,
"definition":null,
"air_and_water_reactions":null,
"general_description":null,
"reactivity_profile":null,
"fire_hazard":null
}
def parse_chemi_link(self, response):
items = ChemibookItem()
#------------------------------BASIC INFORMATION
link = response.url
name = response.css('.ProdSupplierGN_ProductA_2 .td1+ td a::text').get()
synomym = response.css('.ProdSupplierGN_ProductA_2+ .ProdSupplierGN_ProductA_2 td+ td font::text').getall()
items['link'] = link
items['name'] = name
items['synomym'] = synomym
yield items
答案 0 :(得分:0)
我强烈建议您为此任务使用XPath表达式(因为您可以参考锚文本):
response.xpath('string(//td[.="Product Name:"]/following-sibling::td[1])').get()