Question

嘿所以我试图获得当前的油价，然后对它进行一些数学分析。我很难找到我在网站上需要的数字。这是我的代码

    # Module oilcost.py to compute the delivery cost for home heating oil.
# Assume your delivery company charges a 10% fee on top of the price 
# per gallon.  The module should take one command line argument 
# indicating the number of gallons needed and should output the 
# total cost.

import sys
import re
import urllib



def getOilPrice(url):
    f = urllib.urlopen(url)
    html=f.read()
    f.close()
    match = re.search(r'<span class="dailyPrice">( d+.? d+)</span>', html)
    return match.group(1) if match else '0'

def outputPrice(oilprice, gallons, total):
    print 'The current oil price is $ %s' %oilprice


def main():
    url = 'http://www.indexmundi.com/commodities/?commodity=heating-oil'
    oilprice = float(getOilPrice(url))     # Create this method
    gallons = float(sys.argv[1])                      # Get from command line
    total = (gallons * 1.1) * oilprice
    outputPrice(oilprice, gallons, total)  # Create this method
if __name__ == '__main__':
    main()

谁能让我知道我做错了什么？

Answer 1

Parsing html is notiorusly fraught with peril;但是出于家庭作业的目的，这可能并不那么重要;这是学习正则表达式的好机会。

就行：

match = re.search(r'<span class="dailyPrice">( d+.? d+)</span>', html)
#                                              ^    ^

你有一些d，它们与文字d相匹配。你可能意味着\d（这是反斜杠）吗？

Answer 2

您的正则表达式与页面内容不匹配。你有：

( d+.? d+)

但页面有：

3.23

您的正则表达式匹配：空格，后跟一个或多个d个字符，后跟任意可选字符，后跟空格，后跟一个或多个d个字符。这可能会更好：

(\d+(\.\d+)?)

这是：一个或多个数字，后跟一个由文字.字符和一个或多个数字组成的可选组。

用于搜索URL的Python代码

2 个答案: