我在代码中处理了一些错误,就是这样:
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
from BeautifulSoup import BeautifulSoup as beatsop
from BeautifulSoup import SoupStrainer as sopstrain
import urllib2
def html_parser(html_data):
html_proc = beatsop(html_data)
onlyforms = sopstrain("form")
forms1 = html_proc.findAll(onlyforms)
txtinput = forms1.findAll('input', {'type': 'text'})
#We take the forms that aren't text
listform = ["radio", "checkbox", "password", "file", "image", "hidden"]
otrimput = forms1.findAll('input', {'type': listform})
# we seach for names in text forms
for e_name in txtinput:
names = e_name['name']
#we seach for value in otrimput
for e_value in otrimput:
value1 = e_value.get('value')
if value1:
pass
else:
print('{} there is no value.'.format(e_value))
html_data = urllib2.urlopen("http://www.google.com")
html_parser(html_data)
所以,有代码,它连接到谷歌,并搜索表格(soupstrainer),一切都还可以,但问题是这显示我这个错误:
txtinput = forms1.findAll('input', {'type': 'text'})
AttributeError: 'ResultSet' object has no attribute 'findAll'
我认为错误是forms1数据是一个列表,但我无法理解如何修改代码以使其工作。
感谢所有人!
答案 0 :(得分:2)
是的,findAll返回一个ResultSet,它是一种列表。因此,您可以选择一个值或迭代它们。下面的代码显示了迭代。
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
from BeautifulSoup import BeautifulSoup as beatsop
from BeautifulSoup import SoupStrainer as sopstrain
import urllib2
def html_parser(html_data):
html_proc = beatsop(html_data)
onlyforms = sopstrain("form")
forms1 = html_proc.findAll(onlyforms)
for found_form in forms1:
txtinput = found_form.findAll('input', {'type': 'text'})
#We take the forms that aren't text
listform = ["radio", "checkbox", "password", "file", "image", "hidden"]
otrimput = found_form.findAll('input', {'type': listform})
# we seach for names in text forms
for e_name in txtinput:
names = e_name['name']
#we seach for value in otrimput
for e_value in otrimput:
value1 = e_value.get('value')
if value1:
pass
else:
print('{} there is no value.'.format(e_value))
html_data = urllib2.urlopen("http://www.google.com")
html_parser(html_data)