美丽的汤:不能回归' p'类

时间:2018-03-09 20:07:02

标签: python python-3.x beautifulsoup

我是Python的新手。我写了一段代码从网站下载信件。我想遍历EachLetter中的每个网址,并仅返回p classLETTER selectionShareable的{​​{1}}中的文字。我希望能够打印从EachLetter返回的带有正确标题的字母,我想我可以使用zip来实现此目的。任何帮助赞赏。

import urllib.request
import time
import bs4
from bs4 import BeautifulSoup
import sys
import urllib
from datetime import datetime
import itertools

# Starts Measuring the Time
start_time = time.time()
# Start Message
print("Program is Starting...")

# The URL for the response
resp = urllib.request.urlopen("https://www.irishtimes.com/opinion/letters")
# Making the Soup
soup = BeautifulSoup(resp, 'html.parser')

# Finding the 'divs'
divs = soup.find('div', {"class": "row sectionteaser"})
letters = {}
i=0

# Finding only the letters with the most recent date
for div in divs:
    if type(div) is bs4.element.Tag:
        i+=1
        letters[i] = [datetime.strptime(div.find('li', {'class': 'last first date'}).string,"%B %d, %Y"), div.find('a', {'class': 'gtm-event'})['href']]

newestdate = datetime(1900, 1, 1)


for letter in letters.items():
    if newestdate < letter[1][0]:
        newestdate = letter[1][0]


newestletters = []

for letter in letters.items():
    if letter[1][0] == newestdate:
        newestletters.append(letter[1][1])


for new in newestletters:
    ITurl = "https://www.irishtimes.com"
    EachLetter = ITurl + new


    response1 = urllib.request.urlopen(EachLetter)
    soup = BeautifulSoup(response1, 'html.parser')
    #print(soup.text)
    for each in EachLetter:
        letters_content = soup.findAll('div', {'class': "article_bodycopy"})
    for letter in letters_content:
        print(letter.find('p', {'class': "LETTER selectionShareable"}).get_text)


print("My program took", time.time() - start_time, "seconds to run")

1 个答案:

答案 0 :(得分:0)

问题在于:

letter.find('p', {'class': "LETTER selectionShareable"})

有时会返回无。

如果您将程序的后半部分更改为:

for each in EachLetter:
    letters_content = soup.findAll('div', {'class': "article_bodycopy"})

    for letter in letters_content:
        found_letter = letter.find('p', {'class': "LETTER selectionShareable"})
        if found_letter:
            print(found_letter.get_text)

它应该打印出你要找的东西。