我写了一个从下载的html文件中删除的程序。我希望它能够实时地抓取网站。有人可以帮我解决我需要改变的问题吗?我不知道如何将其从下载的文件移动到实时网站。
def getStockValue(source):
start = source.find("fac-yshl")
start = source.find("data-value", start + 1)
start = source.find('"', start + 1)
value = source[start:source.find('"',start + 1)]
value = value.replace('"',"")
value = value.replace("'","")
print "stock Value :",value
return value
# end of getStockValue()
def openFile(file):
# read a list, return a dic
try: #is the file there??
data = open(file, "r").read() #returns a string
return data
except IOError:
print " \aNo such file!!!! \"",file,"\" so exiting"
sys.exit(1)
#end of openFile()
def begin(inFile):
#infile is the name of the html file that is saved from google's stock page
#starter file
print "Loaded file,", inFile," Stockmarketprice:"
source = open(inFile,"r").read() # load the whole file.
source = openFile(inFile) # load the whole file.
stockValue = getStockValue(source)
print " Loaded file,", inFile,"\n Stock market price:",stockValue
# end of begin()
import sys
if __name__ == '__main__':
if len(sys.argv) == 2:
begin(sys.argv[1])
else:
sys.exit(0)
答案 0 :(得分:0)
查看BeautifulSoup并请求,这是我的一个项目中的一个例子,它做了我认为你想要做的事情:
first_page = requests.get("http://store.steampowered.com/search/?" + filter)
html = BeautifulSoup(first_page.content, "html.parser")
当你说下载时,我假设你的意思是手动执行。这仍然是#34;下载" html,但它并没有将它存储在内存中,而是为你做的。