我在这里搜索过,但我还没找到一篇可以帮助我完成工作的帖子。
网站:http://www.animefansftw.com/
我试图仅在设定的日期获得所有帖子的h1标题!我能够获得设定日期的实际帖子,但却不知道如何获得帖子的h1标题。
import time
import requests
import re
from bs4 import BeautifulSoup
Aniday = time.strftime("%B %d")
r = requests.get("http://www.animefansftw.com")
r.content
soup = BeautifulSoup(r.content, "html.parser")
print("Today's Animu Crack:\n")
for div in soup.find_all("div", {"class": "date"}):
get_date = div.text
clean_date = " ".join(get_date.split())
if clean_date == Aniday:
print(clean_date)
现在为了避免混淆,我可以获得帖子的h1标题名称,但我不希望所有这些都包含我设置的日期。
for item in soup.find_all("h1"):
info = item.text
clean_info = " ".join(info.split())
print(clean_info)
答案 0 :(得分:0)
看一眼来源,看起来h1标签包含在父母的父母身上。
尝试:
import time
import requests
import re
from bs4 import BeautifulSoup
Aniday = time.strftime("%B %d")
r = requests.get("http://www.animefansftw.com")
r.content
soup = BeautifulSoup(r.content, "html.parser")
print("Today's Animu Crack:\n")
for div in soup.find_all("div", {"class": "date"}):
get_date = div.text
clean_date = " ".join(get_date.split())
if clean_date == Aniday:
post_div = div.parent.parent
title = post_div.h1.text.encode('ascii','ignore')
print("{title}\n{date}\n".format(title=title,date=clean_date))