我写了一段代码,以删除reddit帖子上的所有评论和用户名,但是代码并没有删除所有内容,
可能是什么问题?
这是我的代码:-
import requests
from bs4 import BeautifulSoup
listt = []
count = 0
username_list = []
comment_list = []
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36"}
url = input("Please input reddit url:")
page = requests.get(url, headers=headers)
old_page_url = "https://old"+url[11:]
old_page = requests.get(old_page_url,headers=headers)
soup = BeautifulSoup(page.text,"html.parser")
old_soup = BeautifulSoup(old_page.text,"html.parser")
comments = soup.findAll('div',{'data-test-id':'comment'})
for one_comment in comments:
comment_list.append(one_comment.text)
for name in old_soup.find_all("a"):
listt.append(name.text)
for item in listt:
if item == '[–]':
username_list.append(listt[count+1])
count+=1
for i in range(len((comment_list))):
print(f"Comment made by u/{username_list[i]} = {comment_list[i]}")
`
答案 0 :(得分:0)
我的猜测是其他注释仅在您向下滚动时才会加载?
无论如何,如果您需要获取所有注释和用户名,请使用Reddit API本身:
https://www.reddit.com/comments/{thread-id}.json
例如:https://www.reddit.com/comments/iv5jaa.json将显示来自https://www.reddit.com/r/DeepRockGalactic/comments/iv5jaa/的所有评论。
使用JSON解析器进行操作:)。