在python和scrapy中检查一个数组与另一个数组

时间:2015-04-12 09:06:54

标签: python arrays if-statement scrapy

我试图将Python中的变量设置为一个数组中的字符串元素,该变量基于另一个数组中使用的字符串元素。我虽然知道如何做到这一点很难过。

以下是两个数组:

genre = ["Dance",
    "Festivals",
    "Rock/pop"
    ]

我试图在另一个数组中打印基于这三个元素的类型,即当start_urls = [0]时,genre = [0]:

start_urls = [
    "http://www.allgigs.co.uk/whats_on/London/clubbing-1.html",
    "http://www.allgigs.co.uk/whats_on/London/festivals-1.html",
    "http://www.allgigs.co.uk/whats_on/London/tours-1.html"
] 

完整代码:

genre = ["Dance",
    "Festivals",
    "Rock/pop"
    ]

class AllGigsSpider(CrawlSpider):
    name = "allGigs" # Name of the Spider. In command promt, when in the correct folder, enter "scrapy crawl Allgigs".
    allowed_domains = ["www.allgigs.co.uk"] # Allowed domains is a String NOT a URL. 
    start_urls = [
        "http://www.allgigs.co.uk/whats_on/London/clubbing-1.html",
        "http://www.allgigs.co.uk/whats_on/London/festivals-1.html",
        "http://www.allgigs.co.uk/whats_on/London/tours-1.html"
    ] 

    rules = [
        Rule(SgmlLinkExtractor(restrict_xpaths='//div[@class="more"]'), # Search the start URL's for 
        callback="parse_item", 
        follow=True),
    ]

    def parse_start_url(self, response):
        return self.parse_item(response)

    def parse_item(self, response):#http://stackoverflow.com/questions/15836062/scrapy-crawlspider-doesnt-crawl-the-first-landing-page
        for info in response.xpath('//div[@class="entry vevent"]'):
            item = TutorialItem() # Extract items from the items folder.
            item ['artist'] = info.xpath('.//span[@class="summary"]//text()').extract() # Extract artist information.
            item ['date'] = info.xpath('.//span[@class="dates"]//text()').extract() # Extract date information.
            preview = ''.join(str(s)for s in item['artist'])
            #item ['genre'] = i.xpath('.//li[@class="style"]//text()').extract()
            client = soundcloud.Client(client_id='401c04a7271e93baee8633483510e263', client_secret='b6a4c7ba613b157fe10e20735f5b58cc', callback='http://localhost:9000/#/callback.html')
            tracks = client.get('/tracks', q = preview, limit=1)
            for track in tracks:
                print track.id
                for i, val in enumerate(genre):
                        print '{} {}'.format(genre[i], start_urls[i]) 
                print genre
                #for i, val in enumerate(genre):
                #       print '{} {}'.format(genre[i], start_urls[i])
                item ['trackz'] = track.id
                yield item

任何帮助表示赞赏。

1 个答案:

答案 0 :(得分:0)

for i, val in enumerate(genre):
    print '{} {}'.format(genre[i], start_urls[i])

应该有效