我尝试使用BeautifulSoup 4.成功安装后,总会出现一些错误,我无法修复它,因为#34;汤= BeautifulSoup(html)"
当我使用以下代码时:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
显示错误:
//anaconda/lib/python3.5/site-packages/bs4/__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
To get rid of this warning, change this:
BeautifulSoup([your markup])
to this:
BeautifulSoup([your markup], "lxml")
markup_type=markup_type))
Traceback (most recent call last):
File "<ipython-input-13-d4b16f497b1d>", line 1, in <module>
runfile('/Users/beckswu/Desktop/coursera/using python access web data/class 2.py', wdir='/Users/beckswu/Desktop/coursera/using python access web data')
File "//anaconda/lib/python3.5/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 699, in runfile
execfile(filename, namespace)
File "//anaconda/lib/python3.5/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 88, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "/Users/beckswu/Desktop/coursera/using python access web data/class 2.py", line 37, in <module>
soup = BeautifulSoup(html)
File "//anaconda/lib/python3.5/site-packages/bs4/__init__.py", line 212, in __init__
markup, from_encoding, exclude_encodings=exclude_encodings)):
File "//anaconda/lib/python3.5/site-packages/bs4/builder/_lxml.py", line 108, in prepare_markup
markup, try_encodings, is_html, exclude_encodings)
TypeError: __init__() takes from 2 to 4 positional arguments but 5 were given
然后我将代码更改为
from bs4 import BeautifulSoup
soup = BeautifulSoup(html,"lxml")
markup_type=markup_type))
它还显示错误
markup_type=markup_type))
^
SyntaxError: invalid syntax
我该如何解决?我感谢任何人的帮助。
答案 0 :(得分:0)
我相信您的代码中有错误:
from bs4 import BeautifulSoup
# if you decide to use html as parser
soup = BeautifulSoup("html", features="html.parser")
## the third parameter is the **builder** and it defaults to None, so you dont have to add it. Actually it is not **markup_type**
如果没有lxml,可以通过运行以下命令进行安装:
pip install lxml
然后您将其导入并像这样使用:
from bs4 import BeautifulSoup
import lxml
soup = BeautifulSoup("html", "lxml")
BeautifulSoup构造函数的参数为:
markup =“”,功能=无,构建器=无,parse_only =无,from_encoding =无,exclude_encodings =无和**扭曲。
答案 1 :(得分:-1)
而不是html,你需要传递html的文本文件,如下所示
from bs4 import BeautifulSoup
request = requests.get("http://www.flipkart.com/search").text
soup = BeautifulSoup(request)
希望这有帮助:)