可重复性IXML xpath不起作用

时间:2017-02-15 06:17:19

标签: python scrapy

当我通过可读性和scrapy阅读时,我正在尝试检索一些项目。我写了这段代码:

titles = response.xpath("//a[@class='media__link']").extract()
    #titles = response.xpath('//a/@href').extract()
    print ("%d links was found" %len(titles))


    count=0
    for title in titles:
      item = TutsplusItem()
      item["title"] = title
      print("Title is : %s" %title)
      yield item
      titleInner = Document(title)
      link = titleInner.xpath("//a/@href")
      link = "http://www.bbc.com" + link
      response = requests.get(link)
      doc = Document(response)

      title=doc.xpath("//title/text()")
      headline=doc.xpath("//p[@class='story-body__introduction']/text()")
      bodyText=doc.xpath("//div[class='story-body__inner']/text()")

但是,当我在此行的可读性文档上运行xpath时出现错误:

link = titleInner.xpath("//a/@href)

错误是:

  

追踪(最近的呼叫最后):
  文件“c:\ python27 \ lib \ site-packages \ scrapy-1.3.1-py2.7.egg \ scrapy \ utils \ defer.py”,第102行,在iter_errback中   产量接下来(it)
  文件“c:\ python27 \ lib \ site-packages \ scrapy-1.3.1-py2.7.egg \ scrapy \ spidermiddlewares \ offsite.py”,第29行,在process_spider_output中   对于结果中的x:
  文件“c:\ python27 \ lib \ site-packages \ scrapy-1.3.1-py2.7.egg \ scrapy \ spidermiddlewares \ referer.py”,第22行,中
  return(结果为r的_set_referer(r)或())
  文件“c:\ python27 \ lib \ site-packages \ scrapy-1.3.1-py2.7.egg \ scrapy \ spidermiddlewares \ urllength.py”,第37行,中
  return(r表示结果中的r或()if _filter(r))
  文件“c:\ python27 \ lib \ site-packages \ scrapy-1.3.1-py2.7.egg \ scrapy \ spidermiddlewares \ depth.py”,第58行,中
  return(r表示结果中的r或()if _filter(r))
  文件“C:\ Users \ Mehdi \ PycharmProjects \ WebCrawler \ src \ Crawler.py”,第69行,在解析中   link = titleInner.xpath(“// a / @ href”)
  TypeError:类型''无法序列化。

我无法解决问题所在?

1 个答案:

答案 0 :(得分:0)

我在避免阅读并使用LXML!