Question

我正在运行一个蜘蛛，它正在提取价格和运费等信息......我收到的运费信息就像这样“运费：$。99，运费：，运费：，运费：$。49”......提取它的代码看起来像这样

item["shipping"] = vendor.xpath("normalize-space(.//span[@class='shippingAmount']/text())").extract()

我可以写这一行来拉动“运费：”之后的价格吗？

Answer 1

使用substring-after和substring-before的组合，即

substring-before(
  substring-after(
    "Shipping:$.99,Shipping:,Shipping:,Shipping:$.49",
    "Shipping:"),
  ","
)

在XPath 1.0中，无法获取任意数量的运费的所有运费。您可以通过反复调用substring-after($string, "Shipping:")来删除前一个值来查询第2个，3td，...值。

（当然可以省略换行符。）

Answer 2

您可以使用一些正则表达式提取价格：

import re 
str = "Shipping:$.99,Shipping:,Shipping:,Shipping:$.49"
re.findall(r'[\d+[.]]?\d+', str)
['.99', '.49']

修改

如果没有送货，则为0：

[float(x) if x else 0 for x in re.sub('Shipping:[$]?','',str).split(',')]
[0.99, 0, 0, 0.49]