我正在尝试通过phantomjs解析网页。
例如:
WebPage:
link1,
link2,
link3,
link4,
link5
nextPage
我正在使用此页面:
var parsePage = function(links) {
// parse everyone link
for(var i = 0; i < posts.length; i++ )
parsePost(links[i]);
};
parsePost - 我从页面获取一些信息,比如通过正则表达式获取所有电子邮件和手机,这需要花费大量时间
但是phantomjs(js)是异步的,而不是等待它解析所有链接,然后转到nextPage。 它有点像另一个:
- parsing page1
- parsing link1
- parsing link2
....
- parsing link5
- parsing page2
- parsing link1
....
- parsing link5
-> and just now are comes results to console from parsed page1 -> link1
.....
- parsing page3
因此我需要3分钟的6GB内存:DDD
我该如何解决这个问题?
我试图这样做: 1. mb limit program memory use? ( it'll wait while some processes finished and then it continue to parse another pages ? )
2. i was trying to do like :
> page.open(link, function(... here is pageparser ( wich parsing everyone link))
and then page.close()
but pageparser takes a lot of time, so when i use page.close -> it stop pageparser process.