我还有许多其他具有类似基本代码的脚本都可以工作,但是当我在cmd中运行该Spider时,我打开.csv文件以查看保存的“标题”,将xpath复制到excel中。知道为什么吗?
import scrapy
class MovieSpider(scrapy.Spider):
name = 'movie'
allowed_domains = ['https://www.imdb.com/search/title?start=1']
start_urls = ['https://www.imdb.com/search/title?start=1/']
def parse(self, response):
titles = response.xpath('//*[@id="main"]/div/div/div[3]/div[1]/div[3]/h3/a')
pass
print(titles)
for title in titles:
yield {'Title': title}
---尝试以下两种方法:------
for subject in titles:
yield {
'Title': subject.xpath('.//h3[@class="lister-item-header"]/a/text()').extract_first(),
'Runtime': subject.xpath('.//p[@class="text-muted"]/span/text()').extract_first(),
'Description': subject.xpath('.//p[@class="text-muted"]/p/text()').extract_first(),
'Director': subject.xpath('.//*[@id="main"]/a/text()').extract_first(),
'Rating': subject.xpath('.//div[@class="inline-block ratings-imdb-rating"]/strong/text()').extract_first()
}
答案 0 :(得分:3)
使用 cat_id content
0 3 male malay man nkda walking stick home ambulant ws void deck able walk bendemeer mall home bus stop away adli stays daughter family husband none image image image order cancellation note ct brain duplicate image
1 3 yo chinese man nkda phx hypertension hyperlipidemia benign hyperplasia open cholecystectomy gallbladder empyema distal gastrectomy pud penetrating aortic
或extract()
,还对xpath使用更短,更宽敞的表示法:
extract_first()
输出: