Question

我是一个初学者，虽然学习过程很艰辛，所以这个问题可能真的很简单，但是我正在运行此代码（很杂乱）（保存在文件x.py下），以使用以下命令从网站中提取链接和名称：线格式，例如：

<li style="margin-top: 21px;">
  <a href="http://py4e-data.dr-chuck.net/known_by_Prabhjoit.html">Prabhjoit</a>
</li>

所以我设置了这个：导入urllib.request，urllib.parse，urllib.error 从bs4导入BeautifulSoup 进口ssl ＃忽略SSL证书错误 ctx = ssl.create_default_context（） ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE

url = input('Enter - ')
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')
for line in soup:
    if not line.startswith('<li'):
        continue
    stuff = line.split('"')
    link = stuff[3]
    thing = stuff[4].split('<')
    name = thing[0].split('>')
    count = count + 1
    if count == 18:
        break
print(name[1])
print(link)

并且不断产生错误：

Traceback (most recent call last):
  File "x.py", line 15, in <module>
    if not line.startswith('<li'):
TypeError: 'NoneType' object is not callable

我已经为此奋斗了几个小时，对于任何建议，我将不胜感激。

Answer 1

line不是字符串，并且没有startswith()方法。这是BeautifulSoup Tag object，因为BeautifulSoup已将HTML源文本解析为丰富的对象模型。不要尝试将其视为文本！

引起该错误的原因是，如果访问Tag对象上不知道的任何属性，它将执行search for a child element with that name（因此在这里执行line.find('startswith')），并且由于没有具有该名称的元素，因此将返回None。 None.startswith()然后失败，并显示您看到的错误。

如果您想找到第18个<li>元素，只需向BeautifulSoup询问该特定元素：

soup = BeautifulSoup(html, 'html.parser')
li_link_elements = soup.select('li a[href]', limit=18)
if len(li_link_elements) == 18:
    last = li_link_elements[-1]
    print(last.get_text())
    print(last['href'])

这使用CSS selector仅查找其父是<a>元素且具有<li>属性的href链接元素。搜索仅限于18个这样的标签，并且最后一个被打印出来，但前提是我们实际上在页面中找到18个。

使用Element.get_text() method检索元素文本，其中将包括来自任何嵌套元素（例如<span>或<strong>或其他额外标记）的文本，以及{{1} }属性为accessed using standard indexing notation。

用漂亮的汤和python3不断获取'TypeError：'NoneType'对象不可调用'

1 个答案: