使用beautifulsoup从img标签获取src

时间:2017-08-01 01:59:04

标签: web-scraping beautifulsoup python-3.5 discord.py

这是我的最后一次求助,我试图用我的不和谐机器人进行一些很酷的嵌入,唯一的问题是我似乎无法从网站上获取img,任何人都可以帮忙吗?在大多数情况下,这是其他人告诉我使用的,这里找到的代码不起作用。

async def events(self, ctx):
    """Top GTAO bounses going on right now!"""

    if ctx.message.server.me.bot:
        try:
            await self.bot.delete_message(ctx.message)
        except:
            await self.bot.send_message(ctx.message.author, 'Could not delete your message on ' + ctx.message.server.name)

    url = "https://socialclub.rockstargames.com/" 

    async with aiohttp.get(url) as response:
        soupObject = BeautifulSoup(await response.text(), "html.parser")

    try:
        rm = "[Read More](https://socialclub.rockstargames.com/events)"
        img = "https://i.imgur.com/0Gu4sSK.png"
        avi = "https://i.imgur.com/s5O1yD2.png"
        bonus1 = soupObject.find(class_='bonuses').find('ul').get_text()
        evpic = soupObject.find(class_='eventThumb').find('img').get('src')
        # EMBED
        data = discord.Embed(title='GTA Online Bonuses', description='The Current GTA Online Bonuses', colour=0xE4BA22)
        data.set_author(name='Rockstar Games', icon_url=avi)
        data.add_field(name="This week: \n", value=bonus1)
        data.add_field(name="--------", value=rm)
        data.set_image(url=evpic)
        data.set_thumbnail(url=img)
        a`enter code here`wait self.bot.say(embed=data)


    except discord.HTTPException:
        await self.bot.say("I need the `Embed links` permission to send this OR error")

1 个答案:

答案 0 :(得分:1)

检查网站时,Rockstar在他们的图片中没有使用src标签,因为它是由一些内部JS处理的

>>> soup.find(attrs={'class':'eventThumb'})
<div class="eventThumb">
<img class="lazyload" data-src="https://prod.cloud.rockstargames.com/global/Events/20449/829a53e7-d14e-4de8-a17b-ccb06becfed6.jpg"/>
</div>
>>> _.img
<img class="lazyload" data-src="https://prod.cloud.rockstargames.com/global/Events/20449/829a53e7-d14e-4de8-a17b-ccb06becfed6.jpg"/>
>>> _.get('data-src')
'https://prod.cloud.rockstargames.com/global/Events/20449/829a53e7-d14e-4de8-a17b-ccb06becfed6.jpg'

要解决此问题,您需要将.get('src')更改为.get('data-src')