我的脚本非常简单:我有一个pdf文件的链接,我用request
模块下载文件,然后我使用pdf2json
模块将其解析为JSON。最后,我将它们存储在数据库中。
问题是我有近500个pdf文件要下载和解析,我得到的是致命错误。以下是整个信息:
<--- Last few GCs --->
995553 ms: Mark-sweep 1348.6 (1434.6) -> 1345.6 (1434.6) MB, 794.1 / 0 ms [allocation failure] [GC in old space requested].
996352 ms: Mark-sweep 1345.6 (1434.6) -> 1344.8 (1434.6) MB, 799.4 / 0 ms [allocation failure] [GC in old space requested].
997162 ms: Mark-sweep 1344.8 (1434.6) -> 1344.8 (1434.6) MB, 810.5 / 0 ms [last resort gc].
997969 ms: Mark-sweep 1344.8 (1434.6) -> 1344.6 (1434.6) MB, 806.3 / 0 ms [last resort gc].
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0000024BFAFC9E79 <JS Object>
1:charsToGlyphs(akaFont_charsToGlyphs[0000024BFAF04189<undefined>:~13053]
[pc=000002551914FF4D] (this=00000086B91FE319 <a Font with map 000003EDDCDF64C9>,
chars=000001F2B2EDD0C9 <String[58]: ency liabilities. This may induce spending cuts.
It also w>)
2: handleText(aka PartialEvaluator_handleText) [0000024BFAF04189 <undefined> :~7176]
[pc=00000255188CDC3B] (this=000001F2B2E89841 <a...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
是否出现此错误,因为文件太大或可能是因为代码中有一些优化要执行?或许还有另一个原因......
编辑:
这是执行解析的文件:http://pastebin.com/q1ryuB4Z 我在这里调用函数:http://pastebin.com/fsPMXmwt