所以我得到的错误是:
'NoneType'对象没有属性'lower'
问题是,它在我创建第二种方法之前就已经开始了,但现在却很有气质。我刚刚开始使用pycharm,所以我对场景很新
这是我的代码:
import requests
import sys
from bs4 import BeautifulSoup
import operator
def start(url):
word_list = []
source_code = requests.get(url).text
soup = BeautifulSoup(source_code, 'html.parser')
for post_text in soup.find_all('p'):
content = post_text.string
words = content.lower().split()
for word in words:
word_list.append(word)
clean_up_list(word_list)
def clean_up_list(word_list):
clean_word_list = []
for word in word_list:
accepted = "abcdefghijklmnopqrstuvwxyz\'"
for c in list(word):
if c not in list(accepted):
word = word.replace(c, "")
if len(word) > 0:
print(word)
clean_up_list().append(word)
start('http://www.nameofwebsite.com/')
答案 0 :(得分:1)
这是因为post_text.string
没有文字属性
这是其中一个p
标签中没有文字。所以它返回了None
。
因此,当您执行words = content.lower().split()
时,您实际上是在尝试应用.lower() on None which does not have a .lower attribute
您可以做的是添加if statement
修改:
for post_text in soup.find_all('p'):
content = post_text.string
if content is None: # Checking if content is None
continue
words = content.lower().split()
答案 1 :(得分:1)
以下是一个会导致错误的示例:
from bs4 import BeautifulSoup
soup = BeautifulSoup(
'<p><div>hello</div><div>world</div></p>',
'html.parser'
)
for p in soup.find_all('p'):
print(repr(p.string))
--output:--
None
来自BeautifulSoup docs:
<强> .string 强>
的形式提供
如果代码 只有一个孩子 ,并且该子代是NavigableString, 该子项以.string
您可以使用get_text()
:
from bs4 import BeautifulSoup
soup = BeautifulSoup(
'<p><div>hello</div><div>world</div>',
'html.parser'
)
for p in soup.find_all('p'):
print(p.get_text())
--output:--
helloworld
或.strings
:
from bs4 import BeautifulSoup
soup = BeautifulSoup(
'<p><div>hello</div><div>world</div></p>',
'html.parser'
)
for p in soup.find_all('p'):
for string in p.strings:
print(string)
--output:--
hello
world
但是.strings
也会返回空格(空格,制表符,换行符):
from bs4 import BeautifulSoup
soup = BeautifulSoup(
'''
<p> <---newline there (plus spaces or tab at start of next line)
<div>hello</div> <--newline there (plus spaces or tab at start of next line)
<div>world</div> <--newline there
</p>
''',
'html.parser'
)
for p in soup.find_all('p'):
for string in p.strings:
print(string)
--output:--
hello
world
要跳过空白,可以使用.stripped_strings
:
from bs4 import BeautifulSoup
soup = BeautifulSoup(
'''
<p>
<div>hello</div>
<div>world</div>
</p>
''',
'html.parser'
)
for p in soup.find_all('p'):
for string in p.stripped_strings:
print(string)
--output:--
hello
world