使用IMDbPY时,为什么会出现这么多警告和一些错误?

时间:2011-03-17 22:38:22

标签: python warnings imdb

我正在使用IMDbPY从IMDb中检索数据。我得到了正确的结果,一切看起来都不错,除了一件事:无论我做什么,我都会收到警告。结果很好,但它们只出现在一长串警告之后,有时会出现错误。

例如:以下代码应该打印 Resevior Dogs(1992)

import imdb
db = imdb.IMDb()
movie_obj = db.search_movie('pulp fiction')[0]
db.update(movie_obj)
print movie_obj['long imdb canonical title']

它会这样做,但不会出现以下警告和错误之前:

2011-03-18 00:33:11,490 WARNING [imdbpy.parser.http.domparser] C:\Python27\lib\site-packages\imdb\parser\http\utils.py:459: unable to use "lxml": No module named lxml.html
2011-03-18 00:33:11,507 WARNING [imdbpy.parser.http.domparser] C:\Python27\lib\site-packages\imdb\parser\http\utils.py:450: falling back to "beautifulsoup"
2011-03-18 00:33:13,483 WARNING [imdbpy.parser.http.domparser] C:\Python27\lib\site-packages\imdb\parser\http\utils.py:459: unable to use "lxml": No module named lxml.html
2011-03-18 00:33:13,483 WARNING [imdbpy.parser.http.domparser] C:\Python27\lib\site-packages\imdb\parser\http\utils.py:450: falling back to "beautifulsoup"
2011-03-18 00:33:15,137 ERROR [imdbpy.parser.http.domparser] C:\Python27\lib\site-packages\imdb\parser\http\utils.py:566: DOMHTMLMovieParser: caught exception extracting XPath "//div[@id='tn15title']//span[starts-with(text(), 'TV series')]"
Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\imdb\parser\http\utils.py", line 555, in xpath
    xpath_result = element.xpath(path)
  File "C:\Python27\lib\site-packages\imdb\parser\http\bsouplxml\etree.py", line 57, in xpath
    return path.apply(node)
  File "C:\Python27\lib\site-packages\imdb\parser\http\bsouplxml\bsoupxpath.py", line 113, in apply
    nodes = step.apply(nodes)
  File "C:\Python27\lib\site-packages\imdb\parser\http\bsouplxml\bsoupxpath.py", line 287, in apply
    found = filter(checker, found)
  File "C:\Python27\lib\site-packages\imdb\parser\http\bsouplxml\bsoupxpath.py", line 331, in __call__
    return self.__filter(node)
  File "C:\Python27\lib\site-packages\imdb\parser\http\bsouplxml\bsoupxpath.py", line 360, in __starts_with
    first = node.contents[0]
IndexError: list index out of range
2011-03-18 00:33:16,785 WARNING [imdbpy.parser.http.domparser] C:\Python27\lib\site-packages\imdb\parser\http\utils.py:459: unable to use "lxml": No module named lxml.html
2011-03-18 00:33:16,785 WARNING [imdbpy.parser.http.domparser] C:\Python27\lib\site-packages\imdb\parser\http\utils.py:450: falling back to "beautifulsoup"
2011-03-18 00:33:16,849 WARNING [imdbpy.parser.http.domparser] C:\Python27\lib\site-packages\imdb\parser\http\utils.py:459: unable to use "lxml": No module named lxml.html
2011-03-18 00:33:16,849 WARNING [imdbpy.parser.http.domparser] C:\Python27\lib\site-packages\imdb\parser\http\utils.py:450: falling back to "beautifulsoup"

为什么我会这样?我做错了吗?

1 个答案:

答案 0 :(得分:2)

嗯,这是不言自明的:

  

无法使用“lxml”:没有名为lxml.html的模块

你能这样做来检查模块是否存在吗?

  1. 在终端或命令提示符中,运行python
  2. 发布第一行的输出(例如Python 2.6.6 (r266...)。
  3. 在shell中输入import lxml
  4. 接下来,尝试import lxml.html
  5. 对我而言,这就是:

    blender@desktop:~$ python
    Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) 
    [GCC 4.4.5] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import lxml
    >>> import lxml.html
    >>> 
    

    我安装了模块,所以我没有得到任何输出(导入成功)。