Question

我试图解析来自某个电报频道的图像，例如https://t.me/versusbattlerus，图像在此块上 “img class =”tgme_page_photo_image“src =”https：// ...“但每次方法返回不同且不起作用的链接时，为什么会发生这种情况？我正在使用python 3.6，urllib，beautifulsoup4

方法

import urllib.request
from bs4 import BeautifulSoup


def get_html(url):
    response = urllib.request.urlopen(url)
    return response.read()


def parse(html):
    soup = BeautifulSoup(html, 'lxml')
    image = soup.find('img', class_="tgme_page_photo_image")
    print(image)
    #return image


def main():
    parse(get_html('https://t.me/versusbattlerus'))


if __name__ == '__main__':
    main()

Answer 1

此脚本适用于我，请为测试提供“损坏”链接。

如果有bug，请尝试简单的Linux Shell解决方案：

curl -s https://t.me/SeanChannel |grep -oP '"og:image" content="\K.+(?=")'

解析http返回断开的链接

1 个答案: