我正在使用Python访问此网站并抓取HTML:http://forum.toribash.com/tori_spy.php
如您所见,如果您访问该网页,则内容会在几秒钟内发生变化。这是一个页面,显示论坛上的最新帖子,我正在制作一个能够显示最新帖子的Discord机器人。
现在,它会显示该列表中的第一篇帖子 之前的任何动画/更改。
我想知道是否有办法让我跳过动画或让程序在访问之后等待几秒钟才能抓取所有HTML。
当前代码:
if message.content.startswith("-post"):
await client.send_message(message.channel, ":arrows_counterclockwise: **Accessing forums...**")
await client.send_typing(message.channel)
time.sleep(5)
#access site
session_requests = requests.session()
url = "http://forum.toribash.com/tori_spy.php"
result = session_requests.get(url,headers = dict(referer = url))
#access html
tree = html.fromstring(result.content)
list_stuff=[]
for atag in tree.xpath("//strong/a"): #search for <strong><a>
list_stuff.append(atag.text_content())
await client.send_message(message.channel, ":white_check_mark: Last post was in the thread **"+list_stuff[0]+"**")
非常感谢!
答案 0 :(得分:0)
网页使用ajax
/ xhr
加载新帖子。它使用这样的URL
forum.toribash.com/vaispy.php?do=xml&last=9297850&r=0....
last
是最后一条消息的ID,您可以在HTML
中找到该ID
某个highestid = 9297850;
标记中的<script>
。 r
似乎并不重要 - 至少代码在没有r
的情况下适用于我。
获得highestid
后,您可以使用它来获取XML
最新消息。
在XML
中,您可以将其ID显示为<postid>
,以便在下次请求中使用它。
import requests
from lxml import html
s = requests.session()
result = s.get("http://forum.toribash.com/tori_spy.php")
tree = html.fromstring(result.content)
for script in tree.xpath("//script"):
if script.text and 'highestid' in script.text:
highestid = script.text.split('\n')[3]
highestid = highestid[13:-1]
print('highestid:', highestid)
result = s.get('http://forum.toribash.com/vaispy.php?do=xml&last='+highestid, headers=dict(referer=url))
#print(result.text)
data = html.fromstring(result.content)
for item in data.xpath('.//event'):
print('--- event ---')
print('id:', item.xpath('.//id')[0].text)
print('postid:', item.xpath('.//postid')[0].text)
print(item.xpath('.//preview')[0].text)
当前结果(您的结果可能不同)
highestid: 9297873
--- event ---
id: 9297883
postid: 9297883
me vende esse full valkyrie por 18k
--- event ---
id: 9297881
postid: 9297881
Congratz Goat! Welcome to the team! :)
--- event ---
id: 9297879
postid: 9297879
Try to reset your email password, then attempt to do what I suggested.
--- event ---
id: 9297877
postid: 9297877
Hello Nope. Most of these bugs are known to currently cause issues and they are being worked on. People pinging and rejoining are bots that are being dealt with (it's just an extensive process to...
--- event ---
id: 9297874
postid: 9297874
Bon courage :)