使用BS4获取列表中的项目会导致AttributeError

时间:2019-01-14 15:02:29

标签: python beautifulsoup

我正在尝试从Wikipedia的文章部分中获取今天的信息。当我使用BS4从页面中获取信息时,我正在使用一种方法来查找第二个ul(这对应于“事件”部分中的所有文本)。我需要本文这一部分的内容。我当前的代码如下:

time = datetime.now()
day = time.strftime('%B') + '_' + str(int(time.strftime('%d')))
Label(text = 'ON THIS DAY', font = ('Verdana 12 bold')).grid(column = 1, row = 1, in_ = frame2, padx = 10)
url = 'https://en.wikipedia.org/wiki/' + str(day)
res = requests.get(url)
something = bs4.BeautifulSoup(res.text, features="html.parser")
events = something.find_all('ul')[1]
x = [x.text for x in events]
print(x)

上面显示的代码从python中出现以下错误:

Traceback (most recent call last):
  File "D:\Program Files\Python\Python37\MyScripts\RSSFeed\RSSFeed.py", line 74, in <module>
    load()
  File "D:\Program Files\Python\Python37\MyScripts\RSSFeed\RSSFeed.py", line 71, in load
    onthisday()
  File "D:\Program Files\Python\Python37\MyScripts\RSSFeed\RSSFeed.py", line 64, in onthisday
    x = [x.text for x in events]
  File "D:\Program Files\Python\Python37\MyScripts\RSSFeed\RSSFeed.py", line 64, in <listcomp>
    x = [x.text for x in events]
  File "D:\Program Files\Python\Python37\lib\site-packages\bs4\element.py", line 742, in __getattr__
    self.__class__.__name__, attr))
AttributeError: 'NavigableString' object has no attribute 'text'

我知道此错误是由以下事实引起的:事件只是列表中的一项,但是我该如何解决呢? (顺便说一句,我已经看过其他问题的回答,而在我的问题中都提出了相同的错误。)

1 个答案:

答案 0 :(得分:1)

执行LIST = ...时,您正在抓取该特定元素。完成此操作后,就没有什么要迭代的了,除非您再执行一次soup.find_all('ul')[1]。您可以将整个内容转换为文本,然后在每一行上拆分

find_all

或者,如果您确实想像您最初计划的那样进行列表理解,则必须在import requests import bs4 response = requests.get('https://en.wikipedia.org/wiki/January_14') soup = bs4.BeautifulSoup(response.text, 'html.parser') events = soup.find_all('ul')[1] events_list = events.text.split('\n') print(events_list) (我选择了events)中找到所有这些标签,然后可以遍历这些标签:

<li>

因此,您的完整代码(显然似乎更多,但这只是本节的内容,您可以继续进行下去):

import requests
import bs4


response = requests.get('https://en.wikipedia.org/wiki/January_14')

soup = bs4.BeautifulSoup(response.text, 'html.parser')

events = soup.find_all('ul')[1]
indv_event = events.find_all('li')

x = [x.text for x in indv_event]