我是Python的新手。我写了一段代码从网站下载信件。我想遍历EachLetter
中的每个网址,并仅返回p
class
为LETTER selectionShareable
的{{1}}中的文字。我希望能够打印从EachLetter
返回的带有正确标题的字母,我想我可以使用zip
来实现此目的。任何帮助赞赏。
import urllib.request
import time
import bs4
from bs4 import BeautifulSoup
import sys
import urllib
from datetime import datetime
import itertools
# Starts Measuring the Time
start_time = time.time()
# Start Message
print("Program is Starting...")
# The URL for the response
resp = urllib.request.urlopen("https://www.irishtimes.com/opinion/letters")
# Making the Soup
soup = BeautifulSoup(resp, 'html.parser')
# Finding the 'divs'
divs = soup.find('div', {"class": "row sectionteaser"})
letters = {}
i=0
# Finding only the letters with the most recent date
for div in divs:
if type(div) is bs4.element.Tag:
i+=1
letters[i] = [datetime.strptime(div.find('li', {'class': 'last first date'}).string,"%B %d, %Y"), div.find('a', {'class': 'gtm-event'})['href']]
newestdate = datetime(1900, 1, 1)
for letter in letters.items():
if newestdate < letter[1][0]:
newestdate = letter[1][0]
newestletters = []
for letter in letters.items():
if letter[1][0] == newestdate:
newestletters.append(letter[1][1])
for new in newestletters:
ITurl = "https://www.irishtimes.com"
EachLetter = ITurl + new
response1 = urllib.request.urlopen(EachLetter)
soup = BeautifulSoup(response1, 'html.parser')
#print(soup.text)
for each in EachLetter:
letters_content = soup.findAll('div', {'class': "article_bodycopy"})
for letter in letters_content:
print(letter.find('p', {'class': "LETTER selectionShareable"}).get_text)
print("My program took", time.time() - start_time, "seconds to run")
答案 0 :(得分:0)
问题在于:
letter.find('p', {'class': "LETTER selectionShareable"})
有时会返回无。
如果您将程序的后半部分更改为:
for each in EachLetter:
letters_content = soup.findAll('div', {'class': "article_bodycopy"})
for letter in letters_content:
found_letter = letter.find('p', {'class': "LETTER selectionShareable"})
if found_letter:
print(found_letter.get_text)
它应该打印出你要找的东西。