Question

我写了一个从下载的html文件中删除的程序。我希望它能够实时地抓取网站。有人可以帮我解决我需要改变的问题吗？我不知道如何将其从下载的文件移动到实时网站。

 def getStockValue(source):

        start = source.find("fac-yshl")
        start = source.find("data-value", start + 1)
        start = source.find('"', start + 1)
        value = source[start:source.find('"',start + 1)]

        value = value.replace('"',"")
        value = value.replace("'","")

        print "stock Value :",value
        return value
    # end of getStockValue()


def openFile(file):
# read a list, return a dic
        try: #is the file there??
                data = open(file, "r").read() #returns a string
                return data
        except IOError:
                print "  \aNo such file!!!! \"",file,"\" so exiting"
                sys.exit(1)
#end of openFile()


def begin(inFile):
#infile is the name of the html file that is saved from google's stock page
#starter file
    print "Loaded file,", inFile," Stockmarketprice:"
    source = open(inFile,"r").read() # load the whole file.
    source = openFile(inFile) # load the whole file.
    stockValue = getStockValue(source)

    print "  Loaded file,", inFile,"\n  Stock market price:",stockValue
# end of begin()


import sys

if __name__ == '__main__':
    if len(sys.argv) == 2:
         begin(sys.argv[1])
    else:
         sys.exit(0)

Answer 1

查看BeautifulSoup并请求，这是我的一个项目中的一个例子，它做了我认为你想要做的事情：

first_page = requests.get("http://store.steampowered.com/search/?" + filter)
html = BeautifulSoup(first_page.content, "html.parser")

当你说下载时，我假设你的意思是手动执行。这仍然是＃34;下载＆＃34; html，但它并没有将它存储在内存中，而是为你做的。

Python Webscraping实时

1 个答案: