如何通过python-3.6搜索网站html?

时间:2019-02-11 17:54:20

标签: python html search python-requests python-3.6

我有很多礼物,我需要创建检查器,该检查器会检查礼物是否有效->它将在html中搜索一些单词。我正在寻找“礼品代码无效”

当我尝试通过urllib或请求读取html时,它将仅加载html的一小部分。我是初学者,所以我可能做错了事。

我的代码是:

import requests
link = "https://discord.gift/o2uzOR7YE3CoBpGq"
r = requests.get(link)
print(r.text)

输出为:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8" />
    <meta content="width=device-width, initial-scale=1.0, maximum-scale=1, user-scalable=no" name="viewport" />

    <!-- section:seometa -->
    <meta property="og:type" content="website" />
    <meta property="og:site_name" content="Discord" />
    <meta property="og:title" content="Discord - Free voice and text chat for gamers" />
    <meta
      property="og:description"
      content="Step up your game with a modern voice & text chat app. Crystal clear voice, multiple server and channel support, mobile apps, and more. Get your free server now!"
    /><meta property="og:image" content="https://discordapp.com/assets/ee7c382d9257652a88c8f7b7f22a994d.png" />    <meta name="twitter:card" content="summary_large_image" />
    <meta name="twitter:site" content="@discordapp" />
    <meta name="twitter:creator" content="@discordapp" />
    <!-- endsection -->

    <link
      rel="chrome-webstore-item"
      href="https://chrome.google.com/webstore/detail/lcbhdgefieegnkbopmgklhlpjjdgmbog"
    />
<link rel="stylesheet" href="/assets/0.830216ebaf585f92a484.css" integrity="sha256-qzZED1N67NuVMyWOdvhIGhtLtKnOXSg+F3HcanmdW4Q= sha512-D0iS5hrftKNpXWnvjpfujnvlabUq6K5gsHbsdvctRMtQXzdf2jvZ/JwaRHAPSb9Z5Xb2o8SBeXeMTajvtrkeRw=="><link rel="icon" href="/assets/07dca80a102d4149e9736d4b162cff6f.ico" />    <!-- section:title -->
    <title>Discord</title>
    <!-- endsection -->
  </head>

  <body>
    <div id="app-mount"></div><script nonce="NjksMjM0LDU4LDI4LDkxLDUxLDYzLDE3Mg==">window.__OVERLAY__ = /overlay/.test(location.pathname)</script><script nonce="NjksMjM0LDU4LDI4LDkxLDUxLDYzLDE3Mg==">window.GLOBAL_ENV = {
      API_ENDPOINT: '//discordapp.com/api',
      WEBAPP_ENDPOINT: '//discordapp.com',
      CDN_HOST: 'cdn.discordapp.com',
      ASSET_ENDPOINT: 'https://discordapp.com',
      WIDGET_ENDPOINT: '//discordapp.com/widget',
      INVITE_HOST: 'discord.gg',
      GIFT_CODE_HOST: 'discord.gift',
      MARKETING_ENDPOINT: '//discordapp.com',
      NETWORKING_ENDPOINT: '//router.discordapp.net',
      RELEASE_CHANNEL: 'stable',
      BRAINTREE_KEY: 'production_5st77rrc_49pp2rp4phym7387',
      STRIPE_KEY: 'pk_live_CUQtlpQUF0vufWpnpUmQvcdi',
    };</script><script nonce="NjksMjM0LDU4LDI4LDkxLDUxLDYzLDE3Mg==">!function(){if(null!=window.WebSocket){var n=function(n){try{var e=localStorage.getItem(n);return null==e?null:JSON.parse(e)}catch(n){return null}},e=n("token"),o=n("gatewayURL");if(e&&o){var r=null!=window.DiscordNative||null!=window.require?"etf":"json",t=o+"/?encoding="+r+"&v=6";void 0!==window.Uint8Array&&(t+="&compress=zlib-stream"),console.log("[FAST CONNECT] "+t+", encoding: "+r+", version: 6");var a=new WebSocket(t);a.binaryType="arraybuffer";var i=Date.now(),s={open:!1,gateway:t,messages:[]};a.onopen=function(){console.log("[FAST CONNECT] connected in "+(Date.now()-i)+"ms"),s.open=!0},a.onclose=a.onerror=function(){window._ws=null},a.onmessage=function(n){s.messages.push(n)},window._ws={ws:a,state:s}}}}();</script><script src="/assets/294f56f239ff22f62fc1.js" integrity="sha256-wTRQJKoqMfG3makS9dDuuegpcHSdaGmfoEBQUPXMdDM= sha512-OVrPyjx2akoJ6QS8OZ+9blz/ADtDHruxw4gwLsjfDVUgolO1ZtcgWbOo0Zj9JBNyzAjKOSCfoFoN9lnkF0EYCw=="></script><script src="/assets/eaa48b00154d2e7ac545.js" integrity="sha256-FRTrm1gL5gkDUoKwVuL9hrrmllKXQsZg7r5zy0Xo4bo= sha512-QZ4c5JQKE5rLJf1uGLQaHHL4NpkAigt4TtluicuMZDYDE5fiL7wkaD2CMBxr0xhOO5aNfSFCxcaqBkU/xOEggQ=="></script><script src="/assets/c73d229b094bb39f0686.js" integrity="sha256-thaBLLvK6Up+B8O7zIOF9Uv8IF+gwGuOW+WUe26l/vk= sha512-5ez2fLO3oMI1UPZDif1Szfjwz04ftTNfhWWSqM81hNhuVN7kckAAZR5a1SuQG8rgsqXwN1is53uAL5M2rz/FOg=="></script>  </body>
</html>

您可以在第一张图片中看到,该站点的html中有文本“礼物代码无效”,但是此字符串不在python输出中。

https://ctrlv.cz/kKd3

2 个答案:

答案 0 :(得分:4)

您正在寻找的“礼物​​代码无效”可能是由js渲染的。 请求无法呈现js输出,这就是为什么您找不到它的原因。

如果您使用的是Python 3.6,请尝试使用requests-html渲染带有js输出的网页。

更新示例:

from requests_html import HTMLSession

link = 'https://discord.gift/o2uzOR7YE3CoBpGq'
targetString = "Gift Code Invalid"
session = HTMLSession()
r = session.get(link)
print("Before render is call: ", targetString in r.html.text)
# sleep has to be implemented after initial the render to get the proper response
r.html.render(wait=2, sleep=1)
print("After render is call: ", targetString in r.html.text)

输出:

Before render is call:  False
After render is call:  True
Process finished with exit code 0

您可以访问库的文档以了解不同的方法,例如按元素查找,甚至在渲染后将响应转换为lxml对象: https://html.python-requests.org/

答案 1 :(得分:1)

网站在后台发送一个ajax请求,并检查礼品代码的有效性。它发送一个json响应以指示礼物代码是否有效。然后用javascript填充数据。

获得所需结果的最简单方法是模拟ajax请求并获取消息。您可以在不使用selenium,requests-html或任何其他javascript渲染机制的情况下执行此操作,并且仍然获得所需的输出-检查礼物是否有效。

import requests
gift_code='o2uzOR7YE3CoBpGq' #gift code here
link = f"https://discordapp.com/api/v6/entitlements/gift-codes/{gift_code}?with_application=true&with_subscription_plan=true"
r = requests.get(link)
print(r.json()['message'])

输出

Unknown Gift Code