刮文章的详细信息页面时,缺少Django Dynamic Sc​​raper ERROR强制性elem描述

时间:2019-04-13 11:25:40

标签: django screen-scraping scraper django-dynamic-scraper

在过去的5个小时里,我一直在尝试使Django Dynamic Sc​​raper正常运行,但是却无济于事。每次我尝试获取页面的详细信息对象时,都会出现错误mandatory elem description missing!

我在堆栈溢出和GitHub上都发现了相同的问题。

可以在这里查看:Django-dynamic-scraper unable to scrape the data

在这里: https://github.com/holgerd77/django-dynamic-scraper/issues/26

但是,这不能解决问题,答案只是采用从主页而不是详细信息页面刮取标题的方法。这是在避免问题,而不是解决问题。

这就是我的设置:

enter image description here

如果您在https://cryptonews.com/news/bitcoin-and-altcoins-showing-signs-of-weakness-3673.htm上查看有问题的文章页面

,然后查看DOM元素的控制台。 -> $x("//div[@class='cn-content']/p")将返回p个元素。

因此xpath应该正确。但是我仍然得到此追溯:

2019-04-13 11:05:22 [dds] INFO: Starting to crawl item 33 from page 1(0).
2019-04-13 11:05:22 [dds] INFO: --------------------------------------------------------------------------------------
2019-04-13 11:05:22 [dds] INFO: MP   HTML|GET      title                1(0)-33 OKEx Announced its First Token Sale via IEO
2019-04-13 11:05:22 [dds] INFO: MP   HTML|GET      url                  1(0)-33 https://cryptonews.com/news/okex-announced-its-first-token-sale-via-ieo-3647.htm
2019-04-13 11:05:22 [dds] INFO: MP   HTML|GET      img_url              1(0)-33 https://cimg.co/w/articles/4/5ca/71a18df47d.jpg
2019-04-13 11:05:22 [dds] INFO: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2019-04-13 11:05:22 [dds] INFO: Calling DP3 URL for item 1(0)-33...
2019-04-13 11:05:22 [dds] INFO: URL     : https://cryptonews.com/news/okex-announced-its-first-token-sale-via-ieo-3647.htm
2019-04-13 11:05:22 [dds] INFO: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2019-04-13 11:05:22 [dds] ERROR: Item 1(0)-23 dropped, mandatory elem description missing!
2019-04-13 11:05:22 [dds] ERROR: Item 1(0)-24 dropped, mandatory elem description missing!
2019-04-13 11:05:22 [dds] ERROR: Item 1(0)-15 dropped, mandatory elem description missing!
2019-04-13 11:05:22 [dds] ERROR: Item 1(0)-25 dropped, mandatory elem description missing!
2019-04-13 11:05:22 [dds] ERROR: Item 1(0)-28 dropped, mandatory elem description missing!
2019-04-13 11:05:22 [dds] ERROR: Item 1(0)-17 dropped, mandatory elem description missing!
2019-04-13 11:05:22 [dds] ERROR: Item 1(0)-26 dropped, mandatory elem description missing!
2019-04-13 11:05:22 [dds] ERROR: Item 1(0)-27 dropped, mandatory elem description missing!
2019-04-13 11:05:22 [dds] ERROR: Item 1(0)-29 dropped, mandatory elem description missing!
2019-04-13 11:05:22 [dds] ERROR: Item 1(0)-30 dropped, mandatory elem description missing!
2019-04-13 11:05:22 [dds] ERROR: Item 1(0)-31 dropped, mandatory elem description missing!
2019-04-13 11:05:22 [dds] ERROR: Item 1(0)-33 dropped, mandatory elem description missing!
2019-04-13 11:05:23 [dds] ERROR: Item 1(0)-32 dropped, mandatory elem description missing!
2019-04-13 11:05:23 [scrapy.core.engine] INFO: Closing spider (finished)
2019-04-13 11:05:23 [dds] INFO: Closing Django DB connection.
2019-04-13 11:05:23 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 12071,
 'downloader/request_count': 35,
 'downloader/request_method_count/GET': 35,
 'downloader/response_bytes': 376126,
 'downloader/response_count': 35,
 'downloader/response_status_count/200': 34,
 'downloader/response_status_count/301': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2019, 4, 13, 11, 5, 23, 39450),
 'item_dropped_count': 33,
 'item_dropped_reasons_count/DropItem': 33,
 'log_count/ERROR': 33,
 'log_count/INFO': 379,
 'memusage/max': 66011136,
 'memusage/startup': 66011136,
 'request_depth_max': 1,
 'response_received_count': 34,
 'scheduler/dequeued': 35,
 'scheduler/dequeued/memory': 35,
 'scheduler/enqueued': 35,
 'scheduler/enqueued/memory': 35,
 'start_time': datetime.datetime(2019, 4, 13, 11, 5, 18, 861139)}
2019-04-13 11:05:23 [scrapy.core.engine] INFO: Spider closed (finished)

有人可以帮我吗?

0 个答案:

没有答案