这是python代码,用于使用BeautifulSoup库从github仓库中抓取内容。我面临错误:
“ NoneType”对象没有属性“文本””
在此简单代码中。我在代码中已注释的两行中遇到错误。
import requests
from bs4 import BeautifulSoup
import csv
URL = "https://github.com/DURGESHBARWAL?tab=repositories"
r = requests.get(URL)
soup = BeautifulSoup(r.text, 'html.parser')
repos = []
table = soup.find('ul', attrs = {'data-filterable-for':'your-repos-filter'})
for row in table.find_all('li', attrs = {'itemprop':'owns'}):
repo = {}
repo['name'] = row.find('div').find('h3').a.text
#First Error Position
repo['desc'] = row.find('div').p.text
#Second Error Postion
repo['lang'] = row.find('div', attrs = {'class':'f6 text-gray mt-2'}).find('span', attrs = {'class':'mr-3'}).text
repos.append(repo)
filename = 'extract.csv'
with open(filename, 'w') as f:
w = csv.DictWriter(f,['name','desc','lang'])
w.writeheader()
for repo in repos:
w.writerow(repo)
输出
回溯(最近一次通话最后一次):文件“ webscrapping.py”,第16行 在 repo ['desc'] = row.find('div')。p.text AttributeError:'NoneType'对象没有属性'text'
答案 0 :(得分:0)
发生这种情况的原因是当您通过BeautifulSoup查找元素时,它的行为就像一个dict.get()
调用。当您转到find
个元素时,它会从元素树中get
个元素。如果找不到它,则返回Exception
,而不是引发None
。 None
不具有Element
所具有的属性,例如text
,attr
等。因此,当您在没有{ {1}}或没有验证类型的情况下,您正在赌博,该元素将一直存在。
我可能只会先将导致问题的元素保留在temp变量中,这样您就可以键入check了。要么实施Element.text
try/except
try/except
就个人而言,我倾向于尝试/例外,因为它更加简洁,并且异常捕获是提高程序健壮性的好方法
答案 1 :(得分:0)
您的find
调用不准确且被链接,因此,当您尝试查找没有<div>
子代的p
标记时,您会得到None
,但是继续进行操作在.text
上调用属性None
,使用AttributeError
会使程序崩溃。
尝试以下一组.find
调用,这些调用使用您要使用的itemProp
属性,并使用try-except
块将所有丢失的字段归零:
import requests
from bs4 import BeautifulSoup
import csv
URL = "https://github.com/DURGESHBARWAL?tab=repositories"
r = requests.get(URL)
soup = BeautifulSoup(r.text, 'html.parser')
repos = []
table = soup.find('ul', attrs = {'data-filterable-for': 'your-repos-filter'})
for row in table.find_all('li', {'itemprop': 'owns'}):
repo = {
'name': row.find('a', {'itemprop' : 'name codeRepository'}),
'desc': row.find('p', {'itemprop' : 'description'}),
'lang': row.find('span', {'itemprop' : 'programmingLanguage'})
}
for k, v in repo.items():
try:
repo[k] = v.text.strip()
except AttributeError: pass
repos.append(repo)
filename = 'extract.csv'
with open(filename, 'w') as f:
w = csv.DictWriter(f,['name','desc','lang'])
w.writeheader()
for repo in repos:
w.writerow(repo)
调试输出(除书面CSV之外):
[ { 'desc': 'This a Django-Python Powered a simple functionality based '
'Bot application',
'lang': 'Python',
'name': 'Sandesh'},
{'desc': None, 'lang': 'Jupyter Notebook', 'name': 'python_notes'},
{ 'desc': 'Installing DSpace using docker',
'lang': 'Java',
'name': 'DSpace-Docker-Installation-1'},
{ 'desc': 'This Repo Contains the DSpace Installation Steps',
'lang': None,
'name': 'DSpace-Installation'},
{ 'desc': '(Official) The DSpace digital asset management system that '
'powers your Institutional Repository',
'lang': 'Java',
'name': 'DSpace'},
{ 'desc': 'This Repo contain the DSpace installation steps with '
'docker.',
'lang': None,
'name': 'DSpace-Docker-Installation'},
{ 'desc': 'This Repository contain the Intermediate system for the '
'Collaboration and DSpace System',
'lang': 'Python',
'name': 'Community-OER-Repository'},
{ 'desc': 'A class website to share the knowledge and expanding the '
'productivity through digital communication.',
'lang': 'PHP',
'name': 'class-website'},
{ 'desc': 'This is a POC for the Voting System. It is a precise '
'design and implementation of Voting System based on the '
'features of Blockchain which has the potential to '
'substitute the traditional e-ballet/EVM system for voting '
'purpose.',
'lang': 'Python',
'name': 'Blockchain-Based-Ballot-System'},
{ 'desc': 'It is a short describtion of Modern Django',
'lang': 'Python',
'name': 'modern-django'},
{ 'desc': 'It is just for the sample work.',
'lang': 'HTML',
'name': 'Task'},
{ 'desc': 'This Repo contain the sorting algorithms in C,predefiend '
'function of C, C++ and Java',
'lang': 'C',
'name': 'Sorting_Algos_Predefined_functions'},
{ 'desc': 'It is a arduino program, for monitor the temperature and '
'humidity from sensor DHT11.',
'lang': 'C++',
'name': 'DHT_11_Arduino'},
{ 'desc': "This is a registration from,which collect data from user's "
'desktop and put into database after validation.',
'lang': 'PHP',
'name': 'Registration_Form'},
{ 'desc': 'It is a dynamic multi-part data driven search engine in '
'PHP & MySQL from absolutely scratch for the website.',
'lang': 'PHP',
'name': 'search_engine'},
{ 'desc': 'It is just for learning github.',
'lang': None,
'name': 'Hello_world'}]