Python烧瓶:提取的HTML无法按原样正确加载

时间:2019-09-12 03:15:06

标签: python html python-3.x flask

因此,我有一些代码可让我在“不可见”时通过URL访问我想要的任何网站。您可以在这里进入我的域:“ sm--supermechm500.repl.co/home/ViewWindow”。尝试输入“ http://youtube.com/”。 (它是不可见的,因为您无法通过历史记录返回该网站)。您会注意到该网站似乎已损坏。我所做的就是获取页面的HTML,然后完全按照其原样重新呈现它,除了标题标签。

如果未修改任何代码,为什么页面似乎已损坏? Flask不能识别它收到的某些HTML吗?

这是有问题的python代码:


@web_site.route('/home/ViewWindow/ViewWindowResult/', methods=('GET', 'POST'))
def ViewWindowResult():
  urlboi = request.values.get('url') # received through form
  try:
    response = urllibrequest.urlopen(urlboi)
  except Exception: # attempts to open url
    raise ValueError

  retries = 20
  while retries <= 20:
      try:
        htmlBytes = response.read()
      except Exception: # attempts to read and retries if failed
        retries =- 1
        continue
      else:
        break

  if retries > 0:
    htmlstr = htmlBytes.decode("utf8")
    soupedhtmlstr = BeautifulSoup(htmlstr)
    sortedhtmlstr = soupedhtmlstr.prettify() # make sure code is nested correctly
    sortedhtmlstr = sortedhtmlstr.replace('<head>', '<head><title>ViewWindow - Result</title>') # replace original title with "ViewWindow - Result"
    return sortedhtmlstr 
  else: # if retries is exhausted, end attempt and lead to 500 error.
     raise ValueError

也欢迎任何更改建议。

0 个答案:

没有答案