我收到错误
' NoneType'对象没有属性'编码'
当我运行此代码时
url = soup.find('div',attrs={"class":"entry-content"}).findAll('div', attrs={"class":None})
fobj = open('D:\Scrapping\parveen_urls.txt', 'w')
for getting in url:
fobj.write(getting.string.encode('utf8'))
但是当我使用find而不是findAll时,我会得到一个url。我如何通过findAll从对象获取所有URL?
答案 0 :(得分:3)
'NoneType' object has no attribute 'encode'
您正在使用.string
。如果代码有多个子级.string
,则为None
(docs):
如果标记的唯一子标记是另一个标记,并且该标记具有.string,则父标记被认为与其子标记具有相同的.string:
改为使用.get_text()
。
答案 1 :(得分:1)
下面我提供两个示例和一个可能的解决方案:
doc = ['<html><head><title>Page title</title></head>',
'<body><div class="entry-content"><div>http://teste.com</div>',
'<div>http://teste2.com</div></div></body>',
'</html>']
soup = BeautifulSoup(''.join(doc))
url = soup.find('div',attrs={"class":"entry-content"}).findAll('div', attrs={"class":None})
fobj = open('.\parveen_urls.txt', 'w')
for getting in url:
fobj.write(getting.string.encode('utf8'))
doc = ['<html><head><title>Page title</title></head>',
'<body><div class="entry"><div>http://teste.com</div>',
'<div>http://teste2.com</div></div></body>',
'</html>']
soup = BeautifulSoup(''.join(doc))
"""
The error will rise here because the first find does not return nothing,
and nothing is equals to None. Calling "findAll" on a None object will
raise: AttributeError: 'NoneType' object has no attribute 'findAll'
"""
url = soup.find('div',attrs={"class":"entry-content"}).findAll('div', attrs={"class":None})
fobj = open('.\parveen_urls2.txt', 'w')
for getting in url:
fobj.write(getting.string.encode('utf8'))
doc = ['<html><head><title>Page title</title></head>',
'<body><div class="entry"><div>http://teste.com</div>',
'<div>http://teste2.com</div></div></body>',
'</html>']
soup = BeautifulSoup(''.join(doc))
url = soup.find('div',attrs={"class":"entry-content"})
"""
Deal with documents that do not have the expected html structure
"""
if url:
url = url.findAll('div', attrs={"class":None})
fobj = open('.\parveen_urls2.txt', 'w')
for getting in url:
fobj.write(getting.string.encode('utf8'))
else:
print("The html source does not comply with expected structure")
答案 2 :(得分:0)
我发现问题属于NULL数据。
我通过FILTER OUT NULL DATA修复了它