我有这行:
<div data-asin="B0000BYDR1" data-asin-currency-code="USD" data-asin-price="45.66" data-asin-shipping="0" data-device-type="WEB" data-display-code="Asin is not eligible because it is price competitive" data-substitute-count="-1" id="cerberus-data-metrics" style="display: none;"></div>
我想提取价格:45.66
和data-asin-price="
之间的价格" data-asin-shipping
我找到了这段代码,但是效果不佳。
def extractSubstring(text, sub1, sub2):
pos1 = text.lower().find(sub1) + len(sub1)
pos2 = text.lower().find(sub2)
if pos1 > pos2 and pos2 > 0:
return text[pos1:pos2]
elif pos2 > pos1 and pos1 > 0:
return text[pos2:pos1]
elif pos1 > 0:
return text[pos1:]
elif pos2 > 0:
return text[pos2:]
result = soup.find_all(attrs={"data-asin-currency-code": "USD"})
priceLine='<div data-asin="B0000BYDR1" data-asin-currency-code="USD" data-asin-price="45.66" data-asin-shipping="0" data-device-type="WEB" data-display-code="Asin is not eligible because it is price competitive" data-substitute-count="-1" id="cerberus-data-metrics" style="display: none;"></div>'
sub1 = 'data-asin-price="'
sub2 = '" data-asin-shipping'
substring = extractSubstring(str(priceLine), sub1, sub2)
答案 0 :(得分:0)
BeautifulSoup是必经之路
html = bs4.BeautifulSoup('<div data-asin="B0000BYDR1" data-asin-currency-code="USD" data-asin-price="45.66" data-asin-shipping="0" data-device-type="WEB" data-display-code="Asin is not eligible because it is price competitive" data-substitute-count="-1" id="cerberus-data-metrics" style="display: none;"></div>')
然后:
print(html.div['data-asin-price'])
45.66