Question

以下功能对以下链接执行简单请求：
http://patorjk.com/software/taag/#p=display&f=Graffiti&t=test

我想要的只是在那些大的ACSII信中得到测试信息“测试”。

但是，出于某种原因，我正在寻找的输出文本不在协议中保存的HTML代码中。如果我复制并粘贴链接并使用Google Chrome检查HTML代码，则可以看到输出文本。

似乎我只收到尚未生成正文部分的预请求。如何获取生成 output_text 的“正确”HTML源代码？

以下是：

Python代码
通过请求收到的HTML代码
通过chrome手动检查页面时的HTML代码

1。 Python代码

from bs4 import BeautifulSoup
import requests

def scrape():
    """Scrape from http://patorjk.com

    Crucial section looks like:

        <pre id="taag_output_text" style="float:left;" class="fig" contenteditable="true">
        STRING STRING STRING STRING
        STRING STRING STRING STRING
        </pre>
    """

    URL = "http://patorjk.com/software/taag/#p=display&f=Graffiti&t=TEST"

    with requests.Session() as c:
        source = c.get(URL)

    soup = BeautifulSoup(source.text, "lxml")

    with open("protocol.txt", "w") as file:
        file.write(soup.prettify())

    text = soup.find("pre", id_="taag_output_text")

    if not(text):
        print("Error: output text not found.")

    return text

2。通过请求的HTML代码

  <div id="maincontent">
   <div id="outputFigDisplay">
   </div>

3。 HTML代码通过手动检查

<div id="maincontent">
    <div id="outputFigDisplay" class="fig">
        <pre id="taag_output_text" style="float:left;" class="fig" contenteditable="true">  __                   __   
        _/  |_  ____   _______/  |_ 
        \   __\/ __ \ /  ___/\   __\
         |  | \  ___/ \___ \  |  |  
         |__|  \___  >____  > |__|  
                   \/     \/        
        </pre>
        <div style="clear:both"></div>
    </div>
</div>

Answer 1

正如评论中所提到的，文本是由客户端的js生成的，因此无法使用requests和bs4来删除它，但您可以使用运行的客户端js，如selenium：

from selenium import webdriver

url = "http://patorjk.com/software/taag/#p=display&f=Graffiti&t=TEST"
driver = webdriver.Firefox()
driver.get(url)
element = driver.find_element_by_id("taag_output_text")
text = element.text
driver.close()

print(text)

或者，您可以从http://www.network-science.de/ascii/获得相同的ASCII艺术，而无需使用selenium

import requests
from bs4 import BeautifulSoup

url = "http://www.network-science.de/ascii/ascii.php?TEXT=TEST&FONT=graffiti&RICH=no&FORM=left&STRE=no&WIDT=80"
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
text = soup.find_all('pre')[1].text

print(text)

两种方法都会产生相同的结果：

______________________ ____________________
\__    ___/\_   _____//   _____/\__    ___/
  |    |    |    __)_ \_____  \   |    |
  |    |    |        \/        \  |    |
  |____|   /_______  /_______  /  |____|
                   \/        \/

接收预请求而不是想要的请求

1。 Python代码

2。通过请求的HTML代码

3。 HTML代码通过手动检查

1 个答案: