我在这个论坛上看到了类似的问题,但这与那些
不同我有这样的Item类 ......
<div id="divTable"></div>
<script>
$.http({ url: "Website is here but it's a private API link", method: "GET" })
.then(function (result) {
var tableHTML = "<table border='1'><tr><th>id</th><th>vps id</th><th>event</th></tr>";
var json = eval(result);
var itemsCount = Object.keys(json[0]).length;
for(x = 0; x < json.length; x++)
{
tableHTML += "<tr>";
var counter = 0;
for(var key in json[x])
{
if(counter < 3)
tableHTML += "<td>" + json[x][key] + "</td>";
counter++;
}
tableHTML += "</tr>";
}
tableHTML += "</table>";
$("#divTable").html(tableHTML);
</script>
我的蜘蛛类就像这样
class NewCarItem(Item):
car_petrol_engine_type = Item()
car_petrol_engine_size = Item()
car_petrol_engine_max_power = Item()
car_petrol_engine_max_torque = Item()
car_petrol_engine_fuel_supply_system = Item()
car_diesel_engine_type = Item()
car_diesel_engine_size = Item()
car_diesel_engine_max_power = Item()
car_diesel_engine_max_torque = Item()
car_diesel_engine_fuel_supply_system = Item()
car_transmission_type = Item()
car_suspension_front = Item()
car_suspension_rear = Item()
car_dimension_overall_length_width_height = Item()
car_dimension_wheel = Item()
car_dimension_fuel_tank_capacity = Item()
car_dimension_turning_circle_radius = Item()
car_dimension_boot_space = Item()
car_dimension_tyre = Item()
car_tyre_is_tube_less = Item()
当我试图通过
运行蜘蛛
scrapy抓取new_car_spider
这个命令我得到类似这样的错误
from scrapy import Request
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.contrib.spiders import CrawlSpider, Rule
from car_planet.items import NewCarItem
from car_planet.lib.html_utils import *
class NewCarSpiderSpider(CrawlSpider):
name = 'new_car_spider'
allowed_domains = ['toyotabharat.com']
start_urls = ['http://www.toyotabharat.com/cars/new_cars/']
rules = (
Rule(SgmlLinkExtractor(allow=r'Items/'), callback='parse_item', follow=True),
)
def parse(self, response):
hxs = HtmlXPathSelector(response)
urls = hxs.select("//div[@id='sr-img']/a/@href").extract()
items = []
for url in urls:
formed_url = "http://www.toyotabharat.com"+get_matched_strings("^(.*[\\/])",url)[0]+"spec_org.aspx"
yield Request(formed_url,callback=self.parse_level_one)
def parse_level_one(self,response):
hxs = HtmlXPathSelector(response)
meta_tags = hxs.select("//meta").extract()
item = NewCarItem()
item['url'] = response.url
return item
答案 0 :(得分:0)
看起来像某种错误配置。可能的原因:
未安装包;
它不在PYTHONPATH之下,所以python无法找到它;
上一点的子集 - 您安装了几个python版本,并且此库不适用于当前版本(例如,您仅为python 3安装,并尝试通过python 2导入)
等
您可以采取哪些措施来调试此问题 - 只需import sys; print sys.path
并检查您尝试导入的librray是否可用。
答案 1 :(得分:0)
您从错误的地方导入请求。它位于scrapy.http。 像这样更改导入:
from scrapy.http import Request